The Genome in a Bottle Consortium is a public-private-academic consortium hosted by NIST to develop the technical infrastructure (reference standards, reference methods, and reference data) to enable translation of whole human genome sequencing to clinical practice and innovations in technologies. The priority of GIAB is authoritative characterization of human genomes for use in benchmarking, including analytical validation and technology development, optimization, and demonstration.
GIAB has currently characterized a pilot genome (NA12878/HG001) from the HapMap project, and two son/father/mother trios of Ashkenazi Jewish and Han Chinese ancestry from the Personal Genome Project (selected because, unlike the pilot genome, they are consented for commercial redistribution). These samples and their IDs from NIST, Coriell, and PGP are in this table.
Benchmark (or "High-confidence") variant calls and regions:
We developed an integration pipeline to utilize sequencing data generated by multiple technologies to generate variant calls and regions for use in benchmarking and validating variant calling pipelines. Currently, benchmark VCF and BED files for small variants are available for GRCh37 and GRCh38 under each genome at ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/
GIAB's versions of GRCh37 and GRCh38 reference fasta files, including a new GRCh38 reference in collaboration with the GRC that masks false duplications in GRCh38, are at https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/references/.
New benchmarks for difficult variants and regions:
Structural variants: Currently available for HG002 on GRCh37
Small variants in more difficult regions: v4.2.1 is currently available for HG002, HG003, and HG004 on GRCh37 and GRCh38 (manuscript)
MHC: Included in v4.2.1 small variant benchmark for HG002-HG004 (Manuscript describing MHC benchmark)
Benchmarking best practices:
To establish best practices for using GIAB genomes for benchmarking, we have worked with the Global Alliance for Genomics and Health Benchmarking Team:
Benchmarking tools, Manuscript, GitHub
Stratification Bed Files for Difficult Regions (cite precisionFDA manuscript below)
precisionFDA Truth Challenge V2 Manuscript and data/vcfs are an example of small variant benchmarking with v4.2 and stratifications
Data and analyses from most short, linked, and long read sequencing methods are publicly available without publication embargo (data indexed in GIAB GitHub and FTP).
Links to bam and fastq files
Preprint with new short read genome and exome sequencing data
GIAB 2016 Scientific Data publication
NCBI GIAB Bioproject
NCBI SRA Run Selector
Amazon AWS S3 bucket: s3://giab
Ongoing and Future work:
Current work in the GIAB Analysis Team is focused on establishing assembly-based benchmarks for challenging medically relevant genes and other difficult regions. GIAB is also exploring expanding to additional samples consented for release of WGS and redistribution of commercial products: increasing the diversity of germline reference samples and developing paired tumor-normal cell lines.
The consortium was initiated in a set of meetings in 2011 and 2012, and the consortium holds open, public workshops typically annually. Slides from workshops and conferences are available here. The consortium and workshops are open and new participants are welcome.
Publications by GIAB:
Small variant benchmark including more difficult regions (currently available for HG002, HG003, and HG004 on GRCh37 and GRCh38)
Structural variant benchmark (currently available for HG002 on GRCh37)
Crowd-sourced, expert-curated structural variants
Benchmark small variants for 7 GIAB genomes
Older benchmark small variants for pilot genome
GIAB data collected through 2016
GA4GH Best practices for benchmarking germline small variants
Posters and Presentations by GIAB
Publications using GIAB:
All google scholar publications mentioning GIAB and NIST
NIST-NRC Postdoctoral Fellowship: 2-year fellowship at NIST, US citizens only, ~$72,000 salary plus benefits, relocation expenses included, application deadlines are Feb. 1 and Aug. 1, requires 10 page research proposal. Contact Justin Zook if you are interested in writing a proposal on a genomics research project. We have opportunities posted for metrology in Cancer Genomics, Diploid Assembly, Epigenomics and Transcriptomics, Biological Data Science/Machine Learning, and Precision Medicine.