-
Notifications
You must be signed in to change notification settings - Fork 4
Processing quantified images into an experimental series
Processing quantified images (.CPfluor files) with processData to create an experimental series (.CPseries file)
The output of the image quantification pipeline, (as it stands today), is a bunch of CPfluor files, each representing the quantification of a single image. To start either the binding series pipeline or the off rate pipeline, you have to make a set of CPseries files that represent the concatenated tiles across a binding series.
Note: Examples are in /lab/sarah/RNAarray/example directory on greendragon.
Required and optional inputs to run the process data script:
- A map file that gives the set of directories in which to look for CPfluor files (
-mf
) - [optional] A way of determining which clusters to actually look at (
-cf
). If not given, all clusters will be examined. - [optional] An output directory (
-od
).
Define image directories in the form of a 'map' file (-mf
or --map_CPfluors
). The format is as follows:
row | description |
---|---|
0 | absolute path of base directory |
1 | directory name containing red channel CPfluors |
2- | directory names of green channel CPfluors |
cat bindingseries.map
produces output:
/lab/sarah/RNAarray/example
00_0nM_red/CPfluor
01_0.91nM_green/CPfluor
02_2.73nM_green/CPfluor
03_8nM_green/CPfluor
04_24nM_green/CPfluor
05_74nM_green/CPfluor
06_222nM_green/CPfluor
07_666nM_green/CPfluor
08_2uM_green/CPfluor
cat offrate.map
produces output:
/lab/sarah/RNAarray/example
00_0nM_red/CPfluor
08_offrates_green/CPfluor/
In practice, often only a subset of the quantified clusters represent members of our library. Multiple options are provided in this script to specify which clusters to keep. The simplest option is to provide a CPannot file, which defines how clusters map to library members. The program will find the clusters present in the index of the CPannot file (given in the -cf
or --clusters_to_keep_file
input) and use only those clusters. For more information on generating a CPannot file, refer to Processing sequence data. For an example of this file, see example CPannot file.
The other option is to provide a filter name or a list of filter names (-fp
or --filter_pos
input), and the associated filtered CPseq files (-fs
or --filtered_CPseqs
). The program will then find clusters in the filtered CPseq files that have any of those filters and only keep those clusters.
# to process binding series file
python -m processData -mf bindingseries.map -od bindingCurves -cf anyRNA.CPannot.pkl
# to process off rates
# -r will save the time stamps per tile.
python -m processData -mf offrates.map -od offRates -cf anyRNA.CPannot.pkl -r
There will be several outputs.
ls bindingCurves/
should show:
CPseries/ # directory containining the per-tile CPseries, before any filtering, of the binding series (green channel)
redCPseries/ # directory containining the per-tile CPseries, before any filtering, of the all-cluster image (red channel)
AKPP5_ALL_Bottom_filtered_reduced.CPseries.pkl # Concatenated, reduced CPseries file of only the clusters of interest (green channel).
AKPP5_ALL_Bottom_filtered_reduced_red.CPseries.pkl # Concatenated, reduced CPseries file of only the clusters of interest (red channel).
AKPP5_ALL_Bottom_filtered_reduced.CPtiles.pkl # File mapping clusterID of the concatenated, reduced CPseries file to the original tile.
And
ls offRates/
should show all of the above files and directories, with the addition of the
rates.timeDict.p # Dictionary keyed on tile, with values = array of times corresponding to columns of the resulting CPseries file (in seconds).
Note: for the offrates, different tiles may have different number of images. This is expected and is supported by the package. The resulting CPseries file will have as many columns as the highest number of images of any tile, and empty values are filled with NaN.
This script could use some development, but somewhat depends on future development in the image quantification pipeline. Even at this point, it makes sense to separate the processing of the red and green channels to allow more flexible experimental designs.