Computational Methods for Strain-Level Microbial Detection in Colony and Metagenome Sequencing Data

Review (2020)
Author(s)

Christine Anyansi (TU Delft - Pattern Recognition and Bioinformatics, Broad Institute of MIT and Harvard)

Timothy J. Straub (Broad Institute of MIT and Harvard, Harvard T.H. Chan School of Public Health)

Abigail Manson (Broad Institute of MIT and Harvard)

AM Earl (Broad Institute of MIT and Harvard)

Thomas Abeel (Broad Institute of MIT and Harvard, TU Delft - Pattern Recognition and Bioinformatics)

Research Group
Pattern Recognition and Bioinformatics
Copyright
© 2020 C.A. Anyansi, Timothy J. Straub, Abigail L. Manson, Ashlee M. Earl, T.E.P.M.F. Abeel
DOI related publication
https://doi.org/10.3389/fmicb.2020.01925
More Info
expand_more
Publication Year
2020
Language
English
Copyright
© 2020 C.A. Anyansi, Timothy J. Straub, Abigail L. Manson, Ashlee M. Earl, T.E.P.M.F. Abeel
Research Group
Pattern Recognition and Bioinformatics
Volume number
11
Pages (from-to)
1-17
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Metagenomic sequencing is a powerful tool for examining the diversity and complexity of microbial communities. Most widely used tools for taxonomic profiling of metagenomic sequence data allow for a species-level overview of the composition. However, individual strains within a species can differ greatly in key genotypic and phenotypic characteristics, such as drug resistance, virulence and growth rate. Therefore, the ability to resolve microbial communities down to the level of individual strains within a species is critical to interpreting metagenomic data for clinical and environmental applications, where identifying a particular strain, or tracking a particular strain across a set of samples, can help aid in clinical diagnosis and treatment, or in characterizing yet unstudied strains across novel environmental locations. Recently published approaches have begun to tackle the problem of resolving strains within a particular species in metagenomic samples. In this review, we present an overview of these new algorithms and their uses, including methods based on assembly reconstruction and methods operating with or without a reference database. While existing metagenomic analysis methods show reasonable performance at the species and higher taxonomic levels, identifying closely related strains within a species presents a bigger challenge, due to the diversity of databases, genetic relatedness, and goals when conducting these analyses. Selection of which metagenomic tool to employ for a specific application should be performed on a case-by case basis as these tools have strengths and weaknesses that affect their performance on specific tasks. A comprehensive benchmark across different use case scenarios is vital to validate performance of these tools on microbial samples. Because strain-level metagenomic analysis is still in its infancy, development of more fine-grained, high-resolution algorithms will continue to be in demand for the future.