In quantitative polymerase chain-reaction (qPCR) measurements, accurately subtracting background signals, quantifying the amount of DNA, and ensuring fidelity of data analysis and diagnostic tests is challenging. Such problems are especially acute for emerging / novel diseases that lack thoroughly developed reference materials. This invention describes a new background subtraction and data analysis algorithm for qPCR that addresses such problems. Critically, this algorithm can decrease fluorescence thresholds for identifying DNA, thereby increasing measurement sensitivity by up to a factor of 10 relative to state-of-the-art approaches. Moreover, the algorithm encompasses a set of automated consistency checks that determine when the measurement is not behaving as expected, thereby facilitating real-time quality control of the measurement apparatus and testing protocol.
Quantitative polymerase chain-reaction (qPCR) measurements are a mainstay diagnostic tool for early disease detection. This technique works by iterating or “cycling” a reaction that doubles the amount of a target DNA segment in a sample. With each cycle of PCR, a new copy of DNA coincides with the release of fluorescent signal. By targeting a specific genetic sequence associated with, e.g. a viral genome, qPCR can therefore detect viral particles in a patient sample by exponentially amplifying the corresponding DNA, which is observed indirectly via the increasing amplitude of fluorescence; see Fig. 1. (It should be noted that this method is also used for RNA detection. However, the RNA is first concerted to DNA through an enzymatic process known as reverse transcription.)
Under idealized conditions, photodetectors in qPCR instruments may not be able to detect meaningful signals until there are billions or more fluorophores; moreover, the first few cycles in an amplification process may fail to duplicate DNA. As standard amplification protocols are limited to 40 or so cycles (due to reagent stability), this should still be sufficient to detect a single target DNA strand. However, fluorescence measurements must account for (non-ideal) background sources of light that obscure signals of interest. State-of-the-art techniques (see, e.g. patents cited below) assume that empirical polynomial models can be used to characterize and subtract off this background, despite significant experimental evidence to the contrary; see Fig. 2. Moreover, these polynomials are often extrapolated to correct data outside the fit region, a practice that has long been known to introduce significant systematic errors into measurements. Thus, state-of-the-art background subtraction techniques in qPCR may in fact decrease the sensitivity of diagnostic protocols by increasing the apparent noise-floor. See also Fig. 3.
This problem is further compounded by data analysis techniques that comprise diagnostic tests and quantification strategies. Such approaches rely on subjective thresholds that the amplification curve (as a function of cycle number) must surpass to be considered a meaningful signal, i.e. a true positive. Because these thresholds must be significantly larger (e.g. a decade) than the apparent noise-floor, systematic errors unnecessarily force thresholds higher, increasing the probability of false-negatives; see Figs. 1 and 3. Moreover, threshold-based methods do not directly test that the signal manifests exponential growth. Thus, in extreme cases, systematic background errors may lead to false positives. In the event that backgrounds are correctly subtracted, thresholds may still be unable to detect noisy but statistically significant signals.
The invention disclosed herein, which we call “Affine algorithm for qPCR background subtraction and data analysis,” address all of these issues. The algorithm is divided into two main parts: background subtraction, and signal detection.
Background subtraction proceeds by first measuring the fluorescence intensity as a function of cycle number for an extraction blank or non-template control, i.e. a sample for which there should be no DNA; see Fig. 2. Ideally multiple such samples are measured, with some aggregate or mean signal used in subsequent steps. In practice, the amount (but not shape) of this background signal varies per measurement by an unknown amount. Thus, it is not possible to simply subtract the background from each measurement. Rather, the algorithm solves an optimization problem that determines the amount of background that, when removed from the signal, minimizes the mean baseline and its variation. (This baseline corresponds to the first few cycles of the amplification curve, for which there are too few fluorophores to detect.) See Fig. 4.
The second part of the algorithm is based on a new theoretical result demonstrating that, up to a scalar multiple and horizontal shift, all amplification curves for a given chemistry are identical. This result is unusual in that it holds for all phases of a qPCR amplification curve, including the “post-exponential” region that is typically ignored in diagnostic testing and quantification measurements. To leverage this new information, our algorithm first entails creating a reference or “master” amplification curve that exhibits exponential growth at an early cycle number, since this data will fully explore all phases of the reaction: background noise / no detectable signal, exponential growth, and plateau. This reference is typically a raw measurement signal from a well-prepared system, but it may require some smoothing, e.g. via a physics-based approach where the exponential phase meets the noise floor (however, this was not done in any of the examples below). Next, we solve a constrained optimization problem that maps a new measurement signal to this master curve subject to tolerances such as low relative deviations between the two. This tolerance can be determined by the noise floor computed in the first part of the algorithm, thereby automating the formulation of the optimization. Critically, this ensures that the measured signal exhibits exponential growth consistent with the reference curve and to within the noise of the instrumentation. Curves that do not satisfy these constraints can be flagged for further consideration or even rejected on the grounds of having systematic errors.
In addition to improving DNA detection, the second part of the algorithm may also provide a new method for quantifying the initial amount of DNA in a sample. State-of-the-art approaches rely on multiple calibration measurements with varying but known concentrations of initial DNA. The cycle Cq at which these amplification curves cross a threshold is used to generate a model quantifying the initial amount of DNA as a function of Cq. This model is then used as the basis for all subsequent characterization of samples with unknown initial DNA concentrations. Our algorithm may overcome the need to generate this model by extracting the initial DNA number directly from the transformation parameters that map an amplification curve onto the master curve. Such work to validate this method is on-going, however.
In reduction-to-practice, we have found that this algorithm works for a variety of human DNA constructs. Moreover, using human standard reference DNA, we have demonstrated that a given master curve remains valid over many years. This suggests that it may be possible to generate only a single reference curve for use in all assays of a given type. The transformation coefficients determined by the second part of the algorithm should also related to the quantity of initial DNA in a sample, although more work remains to confirm the validity of this. We also note that our algorithm may provide a route to developing preliminary standards in the form of a master curve for novel / emerging infectious diseases.
Figures 1 through 6 illustrate the key elements of our algorithm, while Figs. 7, 8, and 9 illustrate its potential to increase sensitivity of testing. In Fig. 7, we have truncated a collection of amplification curves that are known to be true positives. We then perform the affine analysis on these truncated curves, showing that collapse to within noise thresholds can be achieved without any false-negatives. Notably, the truncated datasets have maximum fluorescence values that are a factor of four below thresholds typically used for this DNA amplification chemistry.
The main limitation of this algorithm is the requirement of one or more measurements of a control sample. While such controls are typically performed in any qPCR protocol, our method benefits from multiple such measurements, which may limit the number of other amplifications that can be run simultaneously. A key benefit of this invention, however, is that it can be implemented within a software framework without fundamentally needing to change the chemistry or underlying experimental aspects of qPCR.