Skip to content

Bootstrap binding series

sjkdenny edited this page Aug 22, 2016 · 13 revisions

Default behavior

Once you have your single cluster fits and your estimation of the acceptable distribution of fmaxes, fitting is redone.

  • Fitting proceeds per variant, rather than per cluster
  • The number of clusters per variant will indicate that acceptable range of fmax (more clusters=greater precision in your estimate of fmax, and therefore smaller distribution).
  • For each variant, the program will decide whether or not to enforce the fmax distribution
    • This will depend on whether the median fluorescence of the highest concentration in the binding series is above the lower bound on fmax.
  • The bootstrapping by default will take 100 resamples of the clusters per variant.
    • During each iteration, the median fluorescence of the resampled clusters is found, and this vector is fit to a binding curve using the initial parameters derived from the median of the single cluster fits.
    • If fmax was enforced, fmax values are chosen from the distribution for the correct number of clusters. Each of these values is used for each of the iterations.

Other options

Note: may of the options described below are in the sarah-develop branch of the git repo. They will be merged upon further testing.

Weighting of fits

By default, the fit will weight residuals by the inverse of the width of the 95% confidence interval on the fluorescence at each concentration point. The purpose of this is to use the data that we know better to weight the fit. However, this processing can be problematic if there were systematic residuals in the fit, especially if the high residual corresponds to a point that is weighted highly during the fit (i.e. you were very precise about a point that was not accurate).

For example, the following binding curve (A) shows that the original, unweighted fit (magenta dotted line) gives a different fit value for dG than the weighted fit (red line). This is likely due to the systematic residuals observed between the fit and the actual fluorescence values. (B) shows the average residuals in the original, unweighted fit and the final, weighted fit, for variants with initial dG between -12.2 and -12 kcal/mol. The final fit optimized to lower the residuals of the earlier binding points, which are weighed more highly, at the expense of later binding points.

A. B.

This effect can be mediated by not weighting the fits, with the option --no_weights. This option produces the binding curve shown below:

In the future, I think it makes sense to enforce weighting only when taking into consideration the residuals, but this has not been implemented yet.

Enforcing fmax

There is not much point in running this script unless you want to enforce the distribution of fmax. However, in some cases, you may want to compare the effect of universally enforcing fmax (i.e. even for variants that passed the cutoff the program enacts), or universally not enforcing the fmax distribution. This behavior is supported by the program by supplying the input flag: --enforce_fmax 0 to never enforce the fmax dist, or --enforce_fmax 1 to always enforce the fmax dist.

Allowing fmin to float

In practice, I have not allowed fmin to float during this script because my data tends to start at a concentration that allows binding for some subset of variants (i.e. because I haven't used fiducial marks and so haven't quantified images with no binding). Other datasets may not be designed in this fashion, and as a result you may want fmin to float. You can do this by providing the option --fmin_float. In practice, I have not seen any examples where allowing fmin to float produced appreciably different results, and so it is not the default. In the future this may change.