NIST logo

Publication Citation: Why human genome sequencing platforms differ: Integrating sequencing datasets to understand biases and form consensus variant calls

NIST Authors in Bold

Author(s): Justin M. Zook; Brad Chapman; Winston Hide; Marc L. Salit;
Title: Why human genome sequencing platforms differ: Integrating sequencing datasets to understand biases and form consensus variant calls
Published: February 16, 2014
Abstract: Hundreds of thousands of differences exist between human whole genome variant calls from different sequencing platforms and variant calling pipelines, but the reasons for these differences are poorly understood. Well-characterized whole genome Reference Materials are needed to understand biases and enable performance assessment of DNA sequencing in clinical and research laboratories. We have developed methods to compare and integrate 9 whole genome datasets for one genome. We developed methods to form consensus genotype calls from all data sets by using information about mapping, alignment, and sequencing biases of individual datasets. Based on microarray, our consensus calls have a 10x lower false negative rate compared to any single dataset, with a similar low false positive rate. The resulting set of consensus genotypes allows any laboratory to characterize and improve accuracy and biases (including false negative rates) of different platforms and bioinformatics approaches, which is critical for clinical translation of genome sequencing.
Citation: Nature Biotechnology
Volume: 32
Pages: pp. 246 - 251
Keywords: Human whole genome sequencing; DNA sequencing; Reference Materials; Reference Data
Research Areas: Molecular Pathology, Clinical Diagnostics, Life Sciences Research, Medical Devices, Standard Reference Materials, Standard Reference Data, Bioscience & Health
DOI: http://dx.doi.org/10.1038/nbt.2835  (Note: May link to a non-U.S. Government webpage)
PDF version: PDF Document Click here to retrieve PDF version of paper (6MB)