The Genome in a Bottle Consortium hosted by the National Institute of Standards and Technology, (NIST), is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. In addition, we describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Asian ancestry. The data described include BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Pacific Biosciences, SOLiD, 10X Genomics GemCodeTM WGS, and Illumina paired-end, mate-pair, and synthetic long read. Cell lines, DNA, and genome sequences of these individuals are publicly available and highly characterized. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.
Genomics, Reference Materials, Reference Data, DNA sequencing, Bioinformatics