Author(s)
Justin M. Zook, Jennifer H. McDaniel, David N. Catoe, Lindsay Harris, Marc L. Salit
Abstract
The Genome in a Bottle Consortium hosted by the National Institute of Standards and Technology, (NIST), is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. In addition, we describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Asian ancestry. The data described include BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Pacific Biosciences, SOLiD, 10X Genomics GemCodeTM WGS, and Illumina paired-end, mate-pair, and synthetic long read. Cell lines, DNA, and genome sequences of these individuals are publicly available and highly characterized. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.
Keywords
Genomics, Reference Materials, Reference Data, DNA sequencing, Bioinformatics
Citation
Zook, J.
, McDaniel, J.
, Catoe, D.
, Harris, L.
and Salit, M.
(2016),
Extensive sequencing of seven human genomes to characterize benchmark reference materials, Scientific Data, [online], https://doi.org/10.1038/sdata.2016.25 (Accessed May 1, 2026)
Additional citation formats
Issues
If you have any questions about this publication or are having problems accessing it, please contact [email protected].