Benchmarking challenging small variants with linked and long reads
Justin Wagner, Nathanael David Olson, Lindsay Harris, Marc L. Salit, Fritz Sedlazeck, Chunlin Xiao, Justin Zook
Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling and sequencing methods. Here we use accurate linked and long reads to expand benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are challenging for short reads. These benchmarks add more than 300,000 SNVs and 50,000 insertions or deletions (indels) and include 16% more exonic variants, many in challenging, clinically relevant genes not covered previously, such as PMS2. For HG002, we include 92% of the autosomal GRCh38 assembly while excluding regions problematic for benchmarking small variants, such as copy number variants, that should not have been in the previous version, which included 85% of GRCh38. It identifies eight times more false negatives in a short read variant call set relative to our previous benchmark. We demonstrate that this benchmark reliably identifies false positives and false negatives across technologies, enabling ongoing methods development.
, Olson, N.
, Harris, L.
, Salit, M.
, Sedlazeck, F.
, Xiao, C.
and Zook, J.
Benchmarking challenging small variants with linked and long reads, Cell Genomics, [online], https://doi.org/10.1016/j.xgen.2022.100128
(Accessed September 29, 2023)