Hundreds of thousands of differences exist between human whole genome variant calls from different sequencing platforms and variant calling pipelines, but the reasons for these differences are poorly understood. Well-characterized whole genome Reference Materials are needed to understand biases and enable performance assessment of DNA sequencing in clinical and research laboratories. We have developed methods to compare and integrate 9 whole genome datasets for one genome. We developed methods to form consensus genotype calls from all data sets by using information about mapping, alignment, and sequencing biases of individual datasets. Based on microarray, our consensus calls have a 10x lower false negative rate compared to any single dataset, with a similar low false positive rate. The resulting set of consensus genotypes allows any laboratory to characterize and improve accuracy and biases (including false negative rates) of different platforms and bioinformatics approaches, which is critical for clinical translation of genome sequencing.
Citation: Nature Biotechnology
Pub Type: Journals
Human whole genome sequencing, DNA sequencing, Reference Materials, Reference Data