High-coverage, long-read sequencing of Chinese trio reference samples

Published: June 14, 2019

Author(s)

Justin M. Zook, Nathanael D. Olson, Marc L. Salit, Aaron Wenger, Chunlin Xiao, Robert Sebra

Abstract

Genome In a Bottle (GIAB) is a consortium hosted by the National Institute of Standards and Technology whose primary objective is the development and characterization of human genomic reference materials. The consortium includes representatives from Government, Industry, and Academia. Currently, the GIAB portfolio includes seven genomes: the pilot genome NA12878, and two son-father-mother trios, one trio of Ashkenazi Jewish descent and the other of Han Chinese descent (dx.doi.org/10.1038/sdata.2016.25). The trio samples were selected from the Personal Genome Project with the aim of increasing reference sample diversity. (REF - https://www.nist.gov/publications/conference-report-representing-ethnic-diversity-precision- medicine).The GIAB genomes have been extensively sequenced on a number of different platforms (dx.doi.org/10.1038/sdata.2016.25). The datasets have been used to generate benchmark variant calls sets for benchmarking and validating small variant calling methods (integration paper - https://doi.org/10.1101/281006, benchmarking paper - https://doi.org/10.1101/270157). The benchmark calls are based primarily on short-read data and cover approximately 90% of the human reference genome (dx.doi.org/10.1038/sdata.2016.25). A number of medically relevant genes are difficult to characterize using short-read sequencing data (https://www.nature.com/articles/gim201658). Therefore expanding the benchmark regions using long-read sequencing technologies is of interest to the consortium and its stakeholders.
Citation: Scientific Data
Volume: 6
Pub Type: Journals

Keywords

human genomics, DNA sequencing, long reads, Reference Materials
Created June 14, 2019, Updated July 15, 2019