Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Comparing Variant Call Files for Performance Benchmarking of Next-Generation Sequencing Variant Calling Pipelines

Published

Author(s)

Justin Zook, John G. Cleary, Len Trigg, Francisco De La Vega

Abstract

To evaluate and compare the performance of variant calling methods and confidence scores, comparisons between a test call set and a "gold standard" need to be carried out. Unfortunately, these comparisons are not straightforward with the current Variant Call Files (VCF), which are the standard output of most variant call-ing algorithms for high-throughput sequencing data. Comparisons of VCFs are often is confounded by the different representations of indels, MNPs, and combinations thereof with SNVs in complex re-gions of the genome, resulting in misleading results. A variant caller is inherently a classification method designed to score putative vari-ants with a confidence score that could permit to control the rate of false positives (FP) or false negatives (FN) variants for a given ap-plication. Receiver operator curves (ROC) and the area under the ROC (AUC) are efficient metrics to evaluate a test call set vs. the gold standard. However, in the case of VCF data this also requires a special accounting to deal with discrepant representations. We de-veloped a novel algorithm for comparing variant call sets sets that deals with complex call representation discrepancies and through a dynamic programing method minimizes false positives and nega-tives globally across the entire call sets for accurate performance evaluation of VCFs.
Citation
biorxiv

Keywords

DNA sequencing, Reference Materials, Benchmarking, genomics

Citation

Zook, J. , Cleary, J. , Trigg, L. and De La Vega, F. (2016), Comparing Variant Call Files for Performance Benchmarking of Next-Generation Sequencing Variant Calling Pipelines, biorxiv, [online], https://doi.org/10.1101/023754 (Accessed April 27, 2024)
Created August 3, 2016, Updated October 17, 2023