Understanding the characteristics of sequence-based single-source DNA profiles
Sarah Riman, Hariharan K. Iyer, Lisa A. Borsuk, Peter M. Vallone
The sequencing of STR markers provides additional information present in the underlying sequence variation that is typically masked by traditional fragment-based genotyping. However, the interpretation of STR profiles generated by targeted sequencing methods are susceptible to the same factors encountered in profiles processed through capillary gel electrophoresis. These factors include stochastic variation, noise, stutter artifacts, heterozygote imbalance, and allelic drop-out/in. Our goal is to characterize and understand how these behave in targeted sequence datasets. Here, we developed a framework using statistical tools to systematically interpret the characteristics of single-source DNA profiles generated by targeted sequencing. Sensitivity studies were performed using known single-source samples amplified with the PowerSeq 46GY System Prototype with varying DNA target masses ranging from 15 pg to 500 pg. The STR loci were subjected to DNA library preparation using two commercially available library kits and sequenced on the Illumina MiSeq platform. Raw FASTQ data files were analyzed in STRait Razor v2.0 without applying any thresholds (at a coverage ≥ 1). We investigated the effect of library normalization on average sequence coverage and studied methods for setting analytical and zygosity thresholds. All the data were analyzed per DNA quantity as well as investigated per method. Analyses presented can be applied to sequence data generated by similar targeted sequencing panels and/or NGS platforms.