Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Summary

Consortium hosted by NIST dedicated to authoritative characterization of benchmark human genomes. Sign up for General GIAB and Analysis Team email lists. Information about past Public workshops. Recently launched a New Cancer GIAB project. Interested in job opportunities or collaborations with us? Contact Justin Zook at the email in the right panel.
Click here for the GIAB FAQ

New GIAB Products (Nov 2023 to Aug 2024):

  1. v1.0 TR benchmark for HG002 indels and SVs >=5bp in tandem repeats on GRCh38 (publication)
  2. v1.0 XY benchmark for HG002 small variants in chromosomes X and Y on GRCh38 (preprint)
  3. v1.0 mosaic benchmark for HG002 SNVs on GRCh38
  4. GIAB v3 GRCh38 reference with masked false duplications and contaminations, as well as decoy sequences from CHM13, which we are now using for GIAB analyses (refinement of previous work). 
  5. v3.5 stratifications, including refinements to GRCh37 and GRCh38 (e.g., separate A/T and G/C homopolymer beds) as well as new stratifications for T2T-CHM13v2.0 (preprint)
  6. RNA sequencing data for the HG002 LCL and iPSCs, and HG004 and HG005 LCLs, including Illumina, PacBio Iso-seq and MAS-seq, and ONT direct RNA and cDNA
  7. Public Tumor/Normal data for large batch of the HG008-T pancreatic tumor cell line, including Illumina WGS, HiFi WGS, Element WGS, Onso WGS, Ultima WGS, ONT WGS, Bionano, Kromatid, and Bioskryb single cell WGS, with more data on tumor and normal in process (see data preprint and new Cancer GIAB website)
  8. v1.1 T2T-HG002 Assembly in collaboration with T2T Consortium, which we will be using to create new benchmarks. We have uploaded preliminary draft small variant and SV benchmarks from v1.1.

Description

GIAB Logo

Consortium goals:
The Genome in a Bottle Consortium is a public-private-academic consortium hosted by NIST to develop the technical infrastructure (reference standards, reference methods, and reference data) to enable translation of whole human genome sequencing to clinical practice and innovations in technologies. The priority of GIAB is authoritative characterization of human genomes for use in benchmarking, including analytical validation and technology development, optimization, and demonstration.

Reference samples:
GIAB has currently characterized a pilot genome (NA12878/HG001) from the HapMap project,  and two son/father/mother trios of Ashkenazi Jewish and Han Chinese ancestry from the Personal Genome Project (selected because, unlike the pilot genome, they are consented for commercial redistribution).  These samples and their IDs from NIST, Coriell, and PGP are in this table.

Benchmark (or "High-confidence") variant calls and regions:
We developed an integration pipeline to utilize sequencing data generated by multiple technologies to generate variant calls and regions for use in benchmarking and validating variant calling pipelines. Currently, benchmark VCF and BED files for small variants are available for GRCh37 and GRCh38 under each genome at https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/
GIAB's versions of GRCh37 and GRCh38 reference fasta files, including a new GRCh38 reference in collaboration with the GRC that masks false duplications in GRCh38, are at https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/references/.

New benchmarks for difficult variants and regions:
Structural variants: Currently available for HG002 on GRCh37 and in Challenging Medically Relevant Gene benchmark below
Small variants in more difficult regions: v4.2.1 is available for all 7 GIAB samples on GRCh37 and GRCh38 (manuscript).
MHC: Included in v4.2.1 small variant benchmark for HG001-HG007 (Manuscript describing MHC benchmark)
273 Challenging Medically Relevant Genes small variant and SV benchmarks in HG002 and Preliminary benchmark for T2T-CHM13v1.0

Benchmarking best practices:
To establish best practices for using GIAB genomes for benchmarking, we have worked with the Global Alliance for Genomics and Health Benchmarking Team:
Benchmarking toolsManuscript, GitHub
Stratification Bed Files for Difficult Regions (cite precisionFDA manuscript below)
precisionFDA Truth Challenge V2 Manuscript and data/vcfs are an example of small variant benchmarking with v4.2 and stratifications

Sequencing Data:
Data and analyses from most short, linked, and long read sequencing methods are publicly available without publication embargo (data indexed in GIAB GitHub and FTP).
Links to bam and fastq files
Preprint with new short read genome and exome sequencing data
Draft RNA sequencing data from short and long reads for selected GIAB LCLs and iPSCs
GIAB 2016 Scientific Data publication 
NCBI GIAB Bioproject
NCBI SRA Run Selector
Amazon AWS S3 bucket: s3://giab 

Ongoing and Future work:
Current work in the GIAB Analysis Team is focused on establishing assembly-based benchmarks for challenging genomic regions, including a collaboration with the Telomere-to-Telomere Consortium to create benchmarks from a high-quality complete diploid assembly of HG002.  GIAB is also exploring transcriptomics and expanding to additional samples consented for release of WGS and redistribution of commercial products: increasing the diversity of germline reference samples and developing paired tumor-normal cell lines.

Workshops:
The consortium was initiated in a set of meetings in 2011 and 2012, and the consortium  holds open, public workshops typically annually. Slides from workshops and conferences are available here, and workshop summaries are also available. The consortium and workshops are open and new participants are welcome.

Publications by GIAB:
Review of variant calling and benchmarking
Collaboration with Human Pangenome Reference Consortium developing a high-quality diploid assembly of HG002 and using GIAB benchmarks to evaluate assembly-based variant calls
Challenging Medically Relevant Genes small variant and SV benchmarks for HG002
Small variant benchmark including more difficult regions for 7 GIAB samples on GRCh37 and GRCh38
Structural variant benchmark (currently available for HG002 on GRCh37)
Deprecated MHC benchmark for HG002
Crowd-sourced, expert-curated structural variants for HG002
Deprecated v3.3.2 benchmark small variants for 7 GIAB samples
Deprecated v2.19 benchmark small variants for pilot genome
GIAB data collected through 2016
GA4GH Best practices for benchmarking germline small variants (example use in precisionFDA Challenge V2)
Posters and Presentations by GIAB

Publications using GIAB:
All google scholar publications mentioning GIAB and NIST
Collaborative publications with Telomere-to-Telomere Consortium using GIAB samples

GIAB Email Lists:
General announcements
Analysis Team

Blog about GIAB Work

Research Opportunities:
NIST-NRC Postdoctoral Fellowship: 2-year fellowship at NIST, US citizens only, ~$75,000 salary plus benefits, relocation expenses included, application deadlines are Feb. 1 and Aug. 1, requires 10 page research proposal. Contact Justin Zook if you are interested in writing a proposal on a genomics research project. We have opportunities posted for metrology in Cancer GenomicsDiploid AssemblyEpigenomics and TranscriptomicsBiological Data Science/Machine Learning, and Precision Medicine.

Created July 25, 2012, Updated September 24, 2024