-
Notifications
You must be signed in to change notification settings - Fork 4
Plotting fits
When you are done with the binding curve or offrate pipelines, you may wish to plot your fit curves to evaluate how good they look.
Two scripts are available in the array_fitting_tools/plotscripts
folder that should help you plot binding curves or offrate curves (plotBindingCurves
and plotOffrateCurves
, respectively). These scripts will plot the per-cluster or per-variant fits, depending on the user inputs.
These scripts allow you to plot things from the command line. In practice, you may wish to load files in an ipython notebook for example, and dynamically plot a bunch of things. The script essentially just loads files and initiates a class that has the plotting function. Please feel free to use this class in any context you feel is more appropriate. Even better is if you document here how you did it, so other people can do the same!
For all plots shown below, the black dots represent the median fluorescence across single cluster at that x, the error bars represent the 95% confidence intervals on the median fluorescence. The red line indicates the fit. The gray shaded area represents the 95% confidence interval on the fit including bound on fmin and fmax.
Let's say you have a CPannot file that looks like the following:
variant_number
clusterID
M00653:72:000000000-AKPP5:1:2101:19526:15124 0
M00653:72:000000000-AKPP5:1:2102:4005:7276 0
M00653:72:000000000-AKPP5:1:2102:14626:8517 0
M00653:72:000000000-AKPP5:1:2102:19935:11152 0
M00653:72:000000000-AKPP5:1:2102:23710:15626 0
You would like to plot the final fit of variant 0, after the bootstrapping has occurred. Enter:
basename=bindingCurves/AKPP5_ALL_Bottom_filtered_reduced_normalized
python -m plotBindingCurves -f $basename.CPvariant -cs $basename.CPseries.pkl -v 0 -an anyRNA.CPannot.pkl -c concentrations.txt -out fitsPlotted --annotate
In the directory fitsPlotted
, you should have a file called binding_curve.0.pdf
that looks like:
If you wanted to do multiple variants, i.e. variants 55 and 176 in addition to variant 0 (I'm choosing these to show a range of Kds), you can enter them space-separated in the -vn
input:
basename=bindingCurves/AKPP5_ALL_Bottom_filtered_reduced_normalized
python -m plotBindingCurves -f $basename.CPvariant -cs $basename.CPseries.pkl -v 0 55 176 -an anyRNA.CPannot.pkl -c concentrations.txt -out fitsPlotted --annotate
More plots are in the fitsPlotted
directory:
55 | 176 |
---|---|
![]() |
![]() |
Note that the fit on the left (variant 55) indicates that fmax was enforced from the distribution of fmaxes.
If instead you would like to plot each of the individual cluster fits, you can run the same command but with slightly different inputs:
- The
-f
or--variant_file
input should be changed to the CPfitted file that was the output of the singleClusterFits script. - The
-v
or--variant_number
input should be changed to the clusterID(s) that you want to plot. - The
-an
or--annotated_clusters
input should not be provided.
To plot four of the single cluster fits of variant 0:
basename=bindingCurves/AKPP5_ALL_Bottom_filtered_reduced_normalized
python -m plotBindingCurves -f $basename.CPfitted.pkl -cs $basename.CPseries.pkl -v M00653:72:000000000-AKPP5:1:2101:19526:15124 M00653:72:000000000-AKPP5:1:2102:4005:7276 M00653:72:000000000-AKPP5:1:2102:14626:8517 M00653:72:000000000-AKPP5:1:2102:19935:11152 -c concentrations.txt -out fitsPlotted --annotate
Produces the following four plots:




You would like to plot the off rate of variants 0 and 176. Now use the script:
basename=offRates/AKPP5_ALL_Bottom_filtered_reduced
normbasename=offRates/AKPP5_ALL_Bottom_filtered_reduced_normalized
python -m plotOffrateCurves -f $normbasename.CPvariant -cs $normbasename.CPseries.pkl -v 0 176 -an anyRNA.CPannot.pkl -td offRates/rates.timeDict.p -ts $basename.CPtiles.pkl -out fitsPlotted --annotate
Produces the following output:


It will by default plot all of the tiles on the same plot, each with their own times and errorbars. This can prove difficult to look at and interpret, so there are some options.
- You can specify to only look at the top N tiles with the most number of clusters (
-n N
or--numtiles N
). - You can specify to only look at particular tiles (
-t 001 002
or--tile 001 002
to plot tiles 1 and 2).
Plotting top 2 tiles with the most clusters (-n 2
):


Plotting just tile 001 (-t 001
):


Note: all of the above have the same fit (i.e. red line) regardless of what subset of the data is plotted.
Analogously to the binding curve data, you can plot single clusters with the same pipeline.
basename=offRates/AKPP5_ALL_Bottom_filtered_reduced
normbasename=offRates/AKPP5_ALL_Bottom_filtered_reduced_normalized
python -m plotOffrateCurves -f $normbasename.CPvariant -cs $normbasename.CPseries.pkl -v M00653:72:000000000-AKPP5:1:2101:23567:6964 M00653:72:000000000-AKPP5:1:2111:29470:12239 M00653:72:000000000-AKPP5:1:2116:2661:17489 M00653:72:000000000-AKPP5:1:2104:24780:13535 -td offRates/rates.timeDict.p -ts $basename.CPtiles.pkl -out fitsPlotted --annotate