Approximate, simultaneous comparison of microbial genome architectures via syntenic anchoring of quiver representations

Journal Article (2018)
Author(s)

Alex N. Salazar (TU Delft - Pattern Recognition and Bioinformatics, Broad Institute of MIT and Harvard)

Thomas Abeel (TU Delft - Pattern Recognition and Bioinformatics, Broad Institute of MIT and Harvard)

Research Group
Pattern Recognition and Bioinformatics
Copyright
© 2018 A.N. Salazar, T.E.P.M.F. Abeel
DOI related publication
https://doi.org/10.1093/bioinformatics/bty614
More Info
expand_more
Publication Year
2018
Language
English
Copyright
© 2018 A.N. Salazar, T.E.P.M.F. Abeel
Research Group
Pattern Recognition and Bioinformatics
Issue number
17
Volume number
34
Pages (from-to)
i732-i742
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Motivation A long-standing limitation in comparative genomic studies is the dependency on a reference genome, which hinders the spectrum of genetic diversity that can be identified across a population of organisms. This is especially true in the microbial world where genome architectures can significantly vary. There is therefore a need for computational methods that can simultaneously analyze the architectures of multiple genomes without introducing bias from a reference. Results In this article, we present Ptolemy: a novel method for studying the diversity of genome architectures - such as structural variation and pan-genomes - across a collection of microbial assemblies without the need of a reference. Ptolemy is a 'top-down' approach to compare whole genome assemblies. Genomes are represented as labeled multi-directed graphs - known as quivers - which are then merged into a single, canonical quiver by identifying 'gene anchors' via synteny analysis. The canonical quiver represents an approximate, structural alignment of all genomes in a given collection encoding structural variation across (sub-) populations within the collection. We highlight various applications of Ptolemy by analyzing structural variation and the pan-genomes of different datasets composing of Mycobacterium, Saccharomyces, Escherichia and Shigella species. Our results show that Ptolemy is flexible and can handle both conserved and highly dynamic genome architectures. Ptolemy is user-friendly - requires only FASTA-formatted assembly along with a corresponding GFF-formatted file - and resource-friendly - can align 24 genomes in ∼10 mins with four CPUs and <2 GB of RAM. Availability and implementation Github: https://github.com/AbeelLab/ptolemy Supplementary information Supplementary data are available at Bioinformatics online.