precisionFDA Truth Challenge V2: Calling variants from short- and long-reads in difficult-to-map regions
Nathanael David Olson, Jennifer McDaniel, Justin Wagner, Justin Zook
precisionFDA-hosted challenges provide a venue for inspiring development and comparative analysis of bioinformatic algorithms. The first "PrecisionFDA Truth Challenge", held in 2016, used Genome In A Bottle (GIAB) Consortium benchmarks to evaluate performance in regions accessible to short reads at that time. For this second truth challenge we set out to assess variant calling performance for long and short read sequencing technologies, particularly in segmental duplications and other difficult- to- map regions, and the Major Histocompatibility Complex (MHC). This challenge utilized recently expanded benchmarks for the son of the GIAB Personal Genome Project Ashkenazi Jewish trio (HG002), and his parents (HG003 and HG004). The GIAB consortium recently expanded these benchmarks to include challenging and previously inaccessible genomic regions by utilizing high-accuracy long reads and linked reads. For the MHC region, GIAB created the benchmark using a novel assembly-based method. Prior to the challenge, the V4.1 HG002 benchmark was made available to participants as a known truth set to evaluate method performance. To minimize and detect over-training of algorithms to public benchmarks, the benchmark sets for HG003 and HG004 were released after the submission deadline. In contrast to the 50X short-read data that was the only dataset in the first challenge, long- and short-read datasets for all three members of the trio were made available; 35X Illumina, 35X PacBio HiFi, and 50X Oxford Nanopore Technologies (ONT) data. Starting with fastq files, challenge participants were asked to apply their variant calling pipeline and submit variant calls against GRCh38 for one or more sequencing technologies. There were 64 submissions from 20 participants. Participants primarily used Illumina (24 submissions) and PacBio data (17 submissions), with 20 submissions using multiple technologies. Submissions were benchmarked following best practices from the Global Alliance for Genom
, McDaniel, J.
, Wagner, J.
and Zook, J.
precisionFDA Truth Challenge V2: Calling variants from short- and long-reads in difficult-to-map regions, Cell Genomics, [online], https://doi.org/10.1016/j.xgen.2022.100129
(Accessed March 5, 2024)