Metagenomic assembly through the lens of validation: a review of recent advances in assessing and improving the quality of genes and genomes assembled from metagenomes.
Nathanael D. Olson, Chris M. Hill, Vistoria Cepeda-Espinoza, Jay Ghurye, Sergey Koren, Todd J. Treangen, Mihai Pop
Metagenomic samples are snapshots of complex ecosystems at work. These samples often involve hundreds of bacteria from known and unknown species, contain multiple strain variants, and vary greatly both within and across environments. Many microbes found in microbial communities are not easily grown in a culture making their DNA sequence our only clue into their evolutionary history and biological function. Metagenomic assembly is a computational process aimed at reconstructing genes and genomes from metagenomic mixtures. Genes and genomes are at two ends of the assembly spectrum, the former arguably represent the smallest units for analyzing organismal function, while the latter represents the ideal objective - one contig per replicon. In between these two extremes, current methods have made significant strides in reconstructing DNA comprising operons, tandem gene arrays, and syntentic blocks. Metagenomic assembly has come a long way over the past decade. Shorter, higher throughput sequencing technologies have become the de facto standard in the field. Sequencers are now able to generate billions of short reads in only a few days. Multiple metagenomic assembly strategies, pipelines, and assemblers have appeared in recent years. Due to the inherent complexity of metagenome assembly, regardless of the assembly algorithm and sequencing method, metagenome assemblies contain errors. Recent developments in assembly validation tools have played a pivotal role in improving metagenomics assemblers. Here we will survey recent progress in the field of metagenomic assembly, provide an overview of key approaches for genomic and metagenomic assembly validation, and introduce a new validation pipeline (VALET) that uses de novo metrics for assessing metagenomic assembly quality in the absence of ground-truth reference datasets. We conclude with a discussion of future challenges and opportunities in the field of metagenomic assembly and validation.
, Hill, C.
, Cepeda-Espinoza, V.
, Ghurye, J.
, Koren, S.
, Treangen, T.
and Pop, M.
Metagenomic assembly through the lens of validation: a review of recent advances in assessing and improving the quality of genes and genomes assembled from metagenomes., Briefings in Bioinformatics, [online], https://doi.org/10.1093/bib/bbx098
(Accessed September 26, 2023)