Comparing Variant Call Files for Performance Benchmarking of Next-Generation Sequencing Variant Calling Pipelines

Justin Zook; John G. Cleary; Len Trigg; Francisco De La Vega

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PUBLICATIONS

Comparing Variant Call Files for Performance Benchmarking of Next-Generation Sequencing Variant Calling Pipelines

Published

August 3, 2016

Author(s)

Justin Zook, John G. Cleary, Len Trigg, Francisco De La Vega

Abstract

To evaluate and compare the performance of variant calling methods and confidence scores, comparisons between a test call set and a "gold standard" need to be carried out. Unfortunately, these comparisons are not straightforward with the current Variant Call Files (VCF), which are the standard output of most variant call-ing algorithms for high-throughput sequencing data. Comparisons of VCFs are often is confounded by the different representations of indels, MNPs, and combinations thereof with SNVs in complex re-gions of the genome, resulting in misleading results. A variant caller is inherently a classification method designed to score putative vari-ants with a confidence score that could permit to control the rate of false positives (FP) or false negatives (FN) variants for a given ap-plication. Receiver operator curves (ROC) and the area under the ROC (AUC) are efficient metrics to evaluate a test call set vs. the gold standard. However, in the case of VCF data this also requires a special accounting to deal with discrepant representations. We de-veloped a novel algorithm for comparing variant call sets sets that deals with complex call representation discrepancies and through a dynamic programing method minimizes false positives and nega-tives globally across the entire call sets for accurate performance evaluation of VCFs.

Citation

biorxiv

Pub Weblink

https://doi.org/10.1101/023754

Pub Type

Websites

Keywords

DNA sequencing, Reference Materials, Benchmarking, genomics

Software research, Reference materials, Health, Genomics, Clinical diagnostics and Bioinformatics

Citation

Zook, J. , Cleary, J. , Trigg, L. and De La Vega, F. (2016), Comparing Variant Call Files for Performance Benchmarking of Next-Generation Sequencing Variant Calling Pipelines, biorxiv, [online], https://doi.org/10.1101/023754 (Accessed July 27, 2025)

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created August 3, 2016, Updated October 17, 2023

Was this page helpful?

Comparing Variant Call Files for Performance Benchmarking of Next-Generation Sequencing Variant Calling Pipelines

Author(s)

Abstract

Keywords

Citation

Additional citation formats

Issues