Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls

Justin M. Zook; Brad Chapman; Winston Hide; Marc L. Salit

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PUBLICATIONS

Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls

Published

February 16, 2014

Author(s)

Justin M. Zook, Brad Chapman, Winston Hide, Marc L. Salit

Abstract

Clinical adoption of human genome sequencing requires methods that output genotypes with known accuracy at millions or billions of positions across a genome. Because of substantial discordance among calls made by existing sequencing methods and algorithms, there is a need for a highly accurate set of genotypes across a genome that can be used as a benchmark. Here we present methods to make high-confidence, single-nucleotide polymorphism (SNP), indel and homozygous reference genotype calls for NA12878, the pilot genome for the Genome in a Bottle Consortium. We minimize bias toward any method by integrating and arbitrating between 14 data sets from five sequencing technologies, seven read mappers and three variant callers. We identify regions for which no confident genotype call could be made, and classify them into different categories based on reasons for uncertainty. Our genotype calls are publicly available on the Genome Comparison and Analytic Testing website to enable real-time benchmarking of any method.

Citation

Nature Biotechnology

Volume

Pub Type

Journals

Download Paper

https://doi.org/10.1038/nbt.2835

Keywords

Human whole genome sequencing, DNA sequencing, Reference Materials, Reference Data

Bioscience, Genomics, Health, Clinical diagnostics, Reference data and Reference materials

Citation

Zook, J. , Chapman, B. , Hide, W. and Salit, M. (2014), Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls, Nature Biotechnology, [online], https://doi.org/10.1038/nbt.2835 (Accessed August 24, 2025)

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created February 16, 2014, Updated February 21, 2019

Was this page helpful?

Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls

Author(s)

Abstract

Download Paper

Keywords

Citation

Additional citation formats

Issues