Skip to content

Bootstrap binding series

sjkdenny edited this page Aug 22, 2016 · 13 revisions

Once you have your single cluster fits, and from them an estimation of the acceptable distribution of fmaxes, fitting is redone. This time, fitting proceeds per variant, rather than per cluster, and the number of clusters per variant will indicate that acceptable range of fmax (more clusters=greater precision in your estimate of fmax, and therefore smaller distribution). For each variant, the program will decide whether or not to enforce the fmax distribution, depending on whether the median fluorescence of the highest concentration in the binding series is above the lower bound on fmax. The bootstrapping by default will take 100 resamples of the clusters per variant. During each iteration, the median fluorescence of the resampled clusters is found, and this vector is fit to a binding curve using the initial parameters derived from the median of the single cluster fits. If fmax was enforced, fmax values are chosen from the distribution for the correct number of clusters, and one of each of those values is chosen for each of the iterations.

Weighting of fits

By default, the fit will weight residuals by the inverse of the width of the 95% confidence interval on the fluorescence at each concentration point. The purpose of this is to use the data that we know better to weight the fit. However, this processing can be problematic if there were systematic residuals in the fit, especially if the high residual corresponds to a point that is weighted highly during the fit (i.e. you were very precise about a point that was not accurate).

For example, the following binding curve (A) shows that the original, unweighted fit (magenta dotted line) gives a different fit value for dG than the weighted fit (red line). This is likely due to the systematic residuals observed between the fit and the actual fluorescence values. (B) shows the average residuals in the original, unweighted fit and the final, weighted fit, for variants with initial dG between -12.2 and -12 kcal/mol. the final fit has higer overall residuals because it is differential weighting the earlier fit points.

A. B.