Extensive sequencing of seven human genomes to characterize benchmark reference materials

Published: June 07, 2016


Justin M. Zook, Jennifer H. McDaniel, David N. Catoe, Lindsay Harris, Marc L. Salit


The Genome in a Bottle Consortium hosted by the National Institute of Standards and Technology, (NIST), is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. In addition, we describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Asian ancestry. The data described include BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Pacific Biosciences, SOLiD, 10X Genomics GemCodeTM WGS, and Illumina paired-end, mate-pair, and synthetic long read. Cell lines, DNA, and genome sequences of these individuals are publicly available and highly characterized. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.
Citation: Scientific Data
Pub Type: Journals


Genomics, Reference Materials, Reference Data, DNA sequencing, Bioinformatics
Created June 07, 2016, Updated November 10, 2018