Skip to content

Fitting an off rate series

sjkdenny edited this page Aug 18, 2016 · 17 revisions

You ran an offrate experiment! Starting with fluorescent ligand bound to clusters, you allowed dissociation over time, and you have an image series representing the fraction ligand bound to clusters at increasing times. This page will bring you through an example on how to quantify the off rate of the ligand, starting with quantified images.

Note:

All of the files generated or used in this example are in /lab/sarah/RNAarray/example/ on greendragon.

Input files

This pipeline requires the following input files:

  1. Fluorescent values.
    • A CPseries file: a pandas DataFrame, indexed on the clusterId, with columns corresponding to the fluorescent values for different times.
    • Ideally this file only includes the clusters you would like to fit.
  2. Time series per tile.
    • A dictionary (save as a pickle with extension ".p") giving the per-tile times associated with the columns of the CPseries file.
  3. Tile series
    • A CPtiles file: a pandas Series, indexed on the clusterId, with values corresponding to the tile that cluster corresponds to.
  4. Library members.
    • A CPannot file: a pandas DataFrame, indexed on the clusterID, with a single column 'variant' that indicates which library member is at that cluster.

If you don't yet have these files, refer to this page on processing CPfluor files ([data preprocessing](Processing quantified images into an experimental series)) and mapping barcodes: (sequence preprocessing).

Example time dict file:

from fittinglibs import fileio
timeDict = fileio.loadFile('offRates/rates.timeDict.p')
print timeDict

Produces the following mess:

{'003': [14.046, 251.281, 481.828, 718.75, 951.187, 1180.671, 1423.156, 1774.828, 2006.906, 2245.14, 2770.093, 3603.078, 4437.843, 5676.765, 6507.281, 7334.359, 8176.375, 9011.875, 9846.234, 14248.453], '005': [30.171, 265.515, 496.578, 732.031, 965.734, 1195.265, 1437.921, 1789.187, 2022.031, 2258.734, 2785.328, 3618.671, 4451.609, 5690.953, 6521.984, 7349.703, 8191.953, 9026.859, 9860.515, 14275.656], '002': [6.546, 243.375, 476.046, 711.281, 945.625, 1171.562, 1415.921, 1767.656, 2000.187, 2237.375, 2761.953, 3594.781, 4430.765, 5669.468, 6500.828, 7325.796, 8170.765, 9003.375, 9838.875, 14237.156], '001': [0.0, 236.843, 467.984, 703.843, 938.812, 1165.984, 1407.875, 1758.843, 1992.937, 2229.937, 2755.375, 3588.078, 4423.875, 5662.453, 6492.375, 7318.14, 8163.687, 8995.953, 9831.593, 14224.593], '010': [65.656, 300.39, 532.296, 767.046, 999.843, 1234.968, 1471.828, 1824.453, 2054.718, 2293.015, 2823.046, 3656.265, 4486.765, 5725.921, 6554.687, 7388.39, 8226.671, 9059.859, 9899.031, 14337.0], '011': [71.859, 308.093, 539.812, 775.046, 1006.562, 1242.078, 1478.281, 1831.421, 2061.781, 2299.937, 2829.843, 3664.0, 4493.843, 5733.515, 6561.953, 7396.281, 8232.953, 9066.64, 9906.656, 14350.328], '012': [78.906, 315.25, 546.843, 781.796, 1012.25, 1249.421, 1485.421, 1838.031, 2067.875, 2307.296, 2836.875, 3670.75, 4500.0, 5741.281, 6568.328, 7403.953, 8240.343, 9073.531, 9913.109, 14361.734], '013': [86.828, 322.265, 553.656, 789.265, 1019.328, 1257.265, 1492.562, 1844.421, 2074.453, 2313.828, 2843.453, 3677.281, 4506.796, 5748.343, 6575.703, 7411.718, 8248.171, 9080.328, 9921.25, 14375.359], '007': [45.687, 280.296, 510.968, 745.89, 978.515, 1210.562, 1451.671, 1804.218, 2033.953, 2274.515, 2799.984, 3632.515, 4465.14, 5704.671, 6535.671, 7365.625, 8206.328, 9039.437, 9875.109, 14300.093], '006': [38.265, 272.765, 503.468, 737.968, 970.984, 1202.656, 1445.156, 1796.625, 2027.375, 2266.953, 2792.453, 3625.968, 4459.14, 5698.828, 6529.484, 7357.734, 8198.828, 9032.875, 9867.765, 14287.156], '016': [110.453, 343.5, 574.796, 811.984, 1039.265, 1278.39, 1513.906, 1866.125, 2094.656, 2333.343, 2864.25, 3698.046, 4526.859, 5769.968, 6595.468, 7433.609, 8270.781, 9104.218, 9944.468, 14412.234], '004': [21.859, 258.515, 489.703, 724.578, 957.984, 1187.843, 1430.234, 1782.187, 2014.343, 2251.078, 2778.25, 3611.046, 4444.546, 5684.031, 6514.328, 7342.187, 8184.046, 9019.609, 9852.843, 14262.281], '014': [94.953, 328.046, 561.39, 796.546, 1026.593, 1262.812, 1498.812, 1732.921, 1968.625, 2204.171, 2432.859, 3267.234, 4100.296, 4925.562, 5754.89, 6582.375, 7418.484, 8255.968, 9088.578, 10826.546], '009': [58.265, 293.953, 525.468, 760.765, 993.484, 1226.468, 1465.156, 1699.468, 1934.734, 2167.296, 2398.968, 3229.437, 4063.75, 4889.875, 5720.14, 6547.843, 7380.828, 8219.515, 9053.453, 10764.75], '008': [52.593, 286.812, 518.437, 753.234, 986.031, 1218.046, 1457.281, 1692.734, 1926.812, 2160.234, 2391.125, 3223.203, 4056.187, 4882.671, 5711.859, 6541.671, 7373.046, 8212.687, 9046.843, 10753.296], '015': [102.593, 336.546, 568.078, 803.89, 1032.828, 1270.921, 1506.562, 1741.046, 1975.687, 2211.093, 2439.546, 3273.109, 4107.265, 4933.656, 5762.375, 6588.984, 7425.765, 8263.437, 9096.593, 10837.093]}

Example CPtiles file:

clusterID
M00653:72:000000000-AKPP5:1:2101:19526:15124    001
M00653:72:000000000-AKPP5:1:2102:4005:7276      002
M00653:72:000000000-AKPP5:1:2102:14626:8517     002
M00653:72:000000000-AKPP5:1:2102:19935:11152    002
M00653:72:000000000-AKPP5:1:2102:23710:15626    002

Normalization

I normalize my data by a single value from the red channel.

# normalize green channel CPseries file by red channel CPseries file.
basename=offRates/AKPP5_ALL_Bottom_filtered_reduced
python -m normalizeSeries -b $basename.CPseries.pkl -a $basename"_red.CPseries.pkl"

This produces a CPseries file, where the fluorescence values have been normalized. (see [normalization](Normalization of cluster fluorescence intensities) for more info).

Fit rates per cluster

Fit single clusters to either off rate curve or on rate curve (Default is off).

# fit single clusters
basename=offRates/AKPP5_ALL_Bottom_filtered_reduced
normbasename=$basename"_normalized"
python -m fitRatesPerCluster -cs $normbasename.CPseries.pkl -t $basename.CPtiles.pkl -td offRates/rates.timeDict.p 

Note: if you'd like to only fit a subset to make sure the script is working, use the --subset flag, which should take less than a minute to run.

This produces two outputs:

  1. Single cluster fits.
    • A CPfitted file: a pandas DataFrame, indexed on the clusterId, with columns corresponding to the fit parameters (fmax, koff, fmin) and their associated stde errors (fmax_stde, etc), as well as the coefficient of determination (rsq), the exit flag (from lmfit), and the root mean squared error (rmse).
    • koff is in 1/second.
  2. Fit parameters.
    • A fitParameters file giving the initial guesses, upper- and lower-bounds on the three fit parameters (fmax, dG, fmin). Note: these are the parameters that are shared across all clusters. The initial guess for the fmax will be estimated per cluster (by the maximum value of the fluorescence), and so is set to NaN.

Example CPfitted file:

                                                  fmax      koff          fmin  fmax_stde  koff_stde  fmin_stde       rsq  exit_flag      rmse
M00653:72:000000000-AKPP5:1:2110:15278:23194  0.155019  0.000183  3.101868e-02   0.017777   0.000151   0.044081  0.587036          1  0.133084
M00653:72:000000000-AKPP5:1:2110:13233:12911  2.781173  0.000667  9.853043e-02   0.078231   0.000042   0.037590  0.987463          3  0.399757
M00653:72:000000000-AKPP5:1:2110:8320:10862   1.855716  0.000155  2.997809e-01   0.070028   0.000049   0.232296  0.922999          1  0.544643
M00653:72:000000000-AKPP5:1:2110:16526:20332  1.493321  0.000225  1.262401e-10   0.000000   0.000000   0.000000  0.966848          1  0.362984
M00653:72:000000000-AKPP5:1:2110:12234:20330  1.756226  0.000397  9.702027e-03   0.072196   0.000047   0.056499  0.969505          1  0.434580

Notes

  • This script can also be used to fit on rates: simply set the --fittype on.
  • This script can accommodate a photobleaching correction.
    • photobleaching correction requires setting the percentage remaining after a round of imaging (i.e. --pb_correct 0.985) will assume a 1.5% loss per image.
    • It also requires some indication of how the columns in the CPseries file correlate to the number of images taken. This is assumed to be in a dict format, similar to the time dict input, but has to be manually saved and then specified with the -id or --image_n_dict input.
    • If the image n dict is not supplied but the pb_correct is, it assumes that the columns each represent a sequential image.

Bootstrap fit parameters

For single clusters associated with a variant, bootstrap the fit koff (or kobs) to obtain 95% confidence intervals.

normbasename=offRates/AKPP5_ALL_Bottom_filtered_reduced_normalized
python -m bootStrapFitFile -cf $normbasename.CPfitted.pkl -a anyRNA.CPannot.pkl -p koff

Note:

  • You can change the bootstrapped parameter to be any column in the CPfitted file (i.e. -p kobs for on rates, -p dG for a CPfitted file from a binding series.
  • Default is to bootstrap the medians for 1000 samples.

Example CPvariant file:

Note: this one was done on a subset, so there is only 1 cluster at most per variant.

                 fmax_init     koff_init   fmin_init numTests fitFraction pvalue numClusters fmax_lb        fmax fmax_ub koff_lb          koff koff_ub fmin_lb        fmin fmin_ub        rsq numIter flag
variant_number
0                0.6103634     0.1077054  0.03575844        1         NaN    NaN         NaN     NaN   0.6103634     NaN     NaN     0.1077054     NaN     NaN  0.03575844     NaN   0.994281    1000  NaN
77               0.3455767   0.001434442  0.07060204        1           1   0.25         NaN     NaN   0.3455767     NaN     NaN   0.001434442     NaN     NaN  0.07060204     NaN  0.6348465    1000  NaN
230              0.7296927    0.02572861  0.02724182        1         NaN    NaN         NaN     NaN   0.7296927     NaN     NaN    0.02572861     NaN     NaN  0.02724182     NaN    0.99382    1000  NaN
300               2.677065  0.0008283294   0.1461111        1           1   0.25         NaN     NaN    2.677065     NaN     NaN  0.0008283294     NaN     NaN   0.1461111     NaN  0.9773668    1000  NaN
352             0.02609393  0.0003054129   0.2038353        1         NaN    NaN         NaN     NaN  0.02609393     NaN     NaN  0.0003054129     NaN     NaN   0.2038353     NaN  0.3989996    1000  NaN