Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Summary

Part of the Genome in a Bottle Consortium hosted by NIST dedicated to authoritative characterization of benchmark cancer genomes. Complements NIST's current DNA copy number Reference Materials for HER2, as well as EGFR and MET.  Sign up for General GIAB and Analysis Team email lists.

Interested in job opportunities or collaborations with us? Contact justin.zook [at] nist.gov (Justin Zook).
Click here for the GIAB FAQ

Description

GIAB Logo

Goals:

This project is an extension of the Genome in a Bottle Consortium to develop the technical infrastructure (reference standards, reference methods, and reference data) to enable translation of cancer genome sequencing to clinical practice and innovations in technologies. The priority of GIAB is authoritative characterization of human genomes for use in benchmarking, including analytical validation and technology development, optimization, and demonstration.

Reference Samples:

NIST has been collaborating with Andrew Liss at MGH to develop new tumor cell lines with paired normal samples that are explicitly consented for fully public dissemination of genomic data and cell lines. The first tumor cell line (HG008-T) is from a pancreatic ductal adenocarcinoma, for which we have paired normal pancreatic (HG008-N-P) and duodenal tissue (HG008-N-D) for sequencing, but no normal cell line. We currently are collecting extensive genomic data described below, and are working towards making these cell lines available in public repositories. We plan to have another pancreatic tumor cell line with a paired normal cell line in the near future, but these are still under development. We also welcome additional collaborations for tumor and normal cell line pairs that are explicitly consented for fully public dissemination of genomic data and cell lines.

Benchmark (or "High-confidence") Variant Calls and Regions:

We will be working with the GIAB community to develop benchmark variants for the tumor and normal samples, using assembly-based and mapping-based approaches. We welcome collaborations in this new project.


Sequencing Data:

Starting in the Fall 2023, we began collecting  diverse long and short read paired tumor and normal sequencing data for GIAB HG008 samples. We are making the data public, without embargo, as we collect them.  The data being collected is described in Table 1 (long range) and Table 2 (short range).   We welcome collaborations to analyze these data. 

Data Access:

These contributed data can be accessed through the public GIAB FTP as it becomes available.  We further provide  the Cancer GIAB Data Manifest  which allows for exploration of the tumor and normal data currently available on the FTP.  If you are interested in exploring the manifest, you can create a filter view by 1) selecting the entire spreadsheet 2) Data → filter views → create new filter view.  Please note tumor data collected from year 1 (2022) is from a prior passage of tumor cells and is emphasized in RED.  All current tumor data being collected, is from a large batch of tumor cells known as 0823p23.  Please take these passages into consideration when choosing tumor datasets you are interested in exploring. 

Table 1:  Long Range Datasets
Pending and available long range datasets for HG008 tumor and normal samples. Datasets that are available have been QC'd and are noted by estimated coverage and read lengths.  Coverage estimates reflect the expected coverage of diploid regions of the normal and tumor samples, assuming no whole genome duplication has occurred. If there are multiple dataset available those will be denoted by "(dataset #)" next to stats. See the GIAB FTP or data manifest linked above for access to the available data.  This table is updated as data are received, last update 2024-03-27.
Estimated Read LengthsTechnologyHG008 tumor
cell line (T)
2022 passages
HG008 tumor
cell line (T)
large batch 0823p23
HG008 normal
duodenal tissue (N-D)
HG008 normal
pancreatic tissue (N-P)
~100 - 300 kbOxford Nanopore Technologies (UL)NA~54X , N50 127kbpendingpending
~10 - 100 kbOxford Nanopore Technologies (duplex)NApendingpendingpending
~35 kbOxford Nanopore Technologies (std)NA~63X , N50 35kbpendingpending
~10 - 20 kbPacBio HiFi (Revio)NA~116X, N50 18kbpending~35X, N50 17kb
150 kb - multi MbBionano Optical MappingNAavailableNANA
2x150 bpArima and Phase Genomics HiC-IlluminaPhase Genomics availableArima in QCArima in QCNA
chromosomalKaryologic karyotypingavailableNANANA
Table 2:  Short Range Datasets
Pending and available short range datasets for HG008 tumor and normal samples. Datasets that are available have been QC'd and are noted by estimated coverage and read lengths.  Coverage estimates reflect the expected coverage of diploid regions of the normal and tumor samples, assuming no whole genome duplication has occurred.  If there are multiple dataset available those will be denoted by "(dataset #)" next to stats.See the GIAB FTP or data manifest linked above for access to the available data.   This table is updated as data are received, last update 2024-03-27.
Estimated Read LengthsTechnologyHG008 tumor
cell line (T)
2022 passages
HG008 tumor
cell line (T)
large batch 0823p23
HG008 normal
duodenal tissue (N-D)
HG008 normal
pancreatic tissue (N-P)
2x150 bpIllumina WGS(1) ~191X , 2x150bp
(2) NA
(1) in QC 
(2) ~100X , 2x150bp (in final QC)
(1) in QC 
(2) ~100X , 2x150bp (in final QC)
(1) ~150X, 2x150bp 
(2) NA
150 bpElement - AVITI - short insert - (~350 bp)NA~87X, 2x150bp61X, 2x150bpNA
150 bpElement- AVITI - long insert (1000+ bp)NApendingpendingNA
100 - 200 bpPacBio OnsoNApendingpendingNA
~300 bpUltima UG100NAin QCin QCNA
50 bpBioSkryb single-cell WGS - IlluminaNA<<1X , 120 cells, 2x50bpNANA
~300 bpBioskryb single-cell WGS - UltimaNAin QCNANA

Research Opportunities:

NIST-NRC Postdoctoral Fellowship: 2-year fellowship at NIST, U.S. citizens only, ~$75,000 salary plus benefits, relocation expenses included, application deadlines are Feb. 1 and Aug. 1, requires 10 page research proposal. Contact Justin Zook if you are interested in writing a proposal on a genomics research project. We have opportunities posted for metrology in Cancer Genomics, Diploid Assembly, Epigenomics and Transcriptomics, Biological Data Science/Machine Learning, and Precision Medicine.

GIAB Email Lists:
General announcements
Analysis Team

Created October 18, 2023, Updated April 16, 2024