Skip to main content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Summary

Frequently Asked Questions about the Genome in a Bottle Consortium, NIST's human genome reference materials, and data resources

Description

  1. What's the difference between NIST Reference Material DNA and DNA/cells from Coriell?
    1. NIST worked with Coriell to grow large batches of cells, extract DNA, mix the DNA well, and aliquot into 1000’s of vials that are the NIST Reference Materials for HG001-HG005.  These were characterized under the NIST quality system, and may differ in small ways from the DNA at Coriell, which is from different batches of cells, though in general these differences are expected to be small.  The NIST price is higher because it incorporates some of the costs of the NIST quality system and the extensive NIST/GIAB characterization of these samples.
  2. If I want to start with one GIAB genome, which one should I choose?
    1. GIAB currently develops new benchmarks first on the PGP Ashkenazi Jewish son HG002 (NIST RM 8391), since it has the most extensive trio data and is part of the broad consent of the PGP.  This currently includes benchmarks extending our small variant and structural variant calls. Over 50 commercial products based on this cell line are also available.  Therefore, we recommend that you start with HG002/RM8391, though it is often helpful to use all seven of the GIAB genomes.
  3. Can I use GIAB for exome and targeted gene panel sequencing?
    1. Yes, our benchmarks can be used to assess targeted exome and gene panel sequencing.  You generally will want to subset to your regions of interest, e.g., using the --target-regions option in hap.py.  One important limitation is that our benchmarks contain limited numbers of difficult small variants and structural variants in exons, particularly for targeted panels, so it is particularly important to calculate confidence intervals for performance metrics like precision and recall.  One resource for more challenging variants in clinically important regions is described in https://doi.org/10.1101/335950.
  4. Can I use GIAB data in my publications?
    1. Yes, we encourage all to use these data. All GIAB data is made available with no embargo on publications using the data.  We ask that you cite the appropriate reference in the README for each dataset and/or our data publication (https://www.nature.com/articles/sdata201625).  When using our benchmarks, please cite our small variant characterization paper (https://doi.org/10.1038/s41587-019-0074-6), our best practices for benchmarking paper (https://www.nature.com/articles/s41587-019-0054-x), and/or our SV characterization paper (https://doi.org/10.1101/664623).
  5. Which references did GIAB use for GRCh37/hg19 and GRCh38/hg38?
    1. GRCh37 reference with decoy:ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
    2. GRCh38 reference with no ALT loci: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
  6. Has GIAB characterized any tumor cell lines?
    1. Currently there are no tumor/normal cell lines characterized by NIST or GIAB, although NIST is exploring possibilities for developing appropriately consented tumor/normal cell lines for reference material development.  NIST has characterized CNVs in EGFR, HER2, and MET in several tumor cell lines in SRM 2373 and RM 8366.  The Medical Device Innovation Consortium has put together a Somatic Reference Sample Landscape Report that describes many of the available somatic reference samples available as of early 2019 (https://mdic.org/wp-content/uploads/2019/03/MDIC-SRS-Landscape-Analysis-Report.pdf).
  7. How can I get involved in GIAB?
    1. A good first step to learn about active work is to read recent emails in and sign up for the general GIAB and analysis team google groups:
      1. Analysis Team: https://groups.google.com/forum/#!forum/giab-analysis-team
      2. General GIAB: https://groups.google.com/forum/#!forum/genome-in-a-bottle
  8. Why are there variants in the benchmark vcfs outside the benchmark bed files?
    1. We include some variants outside the benchmark bed file because they reduce the risk of our benchmark including only part of a complex variant (e.g., when one indel is just inside the bed and one is just outside).  These complex variants can often be represented in multiple ways in the vcf file, and it is important that the benchmark vcf include all parts of a complex variant, even if part falls outside the bed, in order to ensure that benchmarking tools will not erroneously count different, but correct, representations of the complex variant as incorrect.
  9. What is the difference between "high-confidence variants and regions" and "benchmark variants and regions"?
    1. In 2018, we decided to change the terminology for our vcf and bed files from "high-confidence" to "benchmark" in order to more clearly convey their intended use for benchmarking performance.  Although we do still have high confidence that the variants are largely true, sometime "high-confidence regions" were interpreted as meaning that everyone should have confidence in their variants in these regions.  Especially as we expand to more difficult regions, our benchmark regions will contain variants and regions that are difficult to characterize for some methods.  In fact, our benchmark variants and regions are intended to enable anyone to determine how well any method performs for different types of variants and genome contexts within our benchmark regions. 
  10. Where can I report potential errors in the GIAB calls?
    1. We have a google form to report small variant errors at https://forms.gle/JcYmJSMTdRfXMvcUA and structural variant errors at https://forms.gle/hmTHtgyRzHozwT4C6, and you can also email Justin Zook at NIST.
  11. GIAB and IGSR/HGSVC both have characterized trios of Chinese ancestry. Does the proband with GIAB ID "HG005_NA24631" correspond to the one of IGSR-HGSVC with ID "HG00512"?
    1. No, these are in fact different trios of Chinese ancestry.  The GIAB Ashkenazi and Chinese trios are from the Personal Genome Project, since they are more broadly consented, including for commercial redistribution, development of iPSCs, etc.  For SVs, we developed the first benchmark that enables both sensitivity and specificity assessment for the son in the Ashkenazi trio (HG002) – see https://doi.org/10.1101/664623.

 

Created April 4, 2019, Updated January 31, 2020