Fast and exact gap-affine partial order alignment with POASTA

Journal Article (2025)
Author(s)

L.R. van Dijk (Broad Institute of MIT and Harvard, TU Delft - Pattern Recognition and Bioinformatics)

Abigail Manson (Broad Institute of MIT and Harvard)

AM Earl (Broad Institute of MIT and Harvard)

Kiran V. Garimella (Broad Institute of MIT and Harvard)

TEPMF Abeel (TU Delft - Pattern Recognition and Bioinformatics, Broad Institute of MIT and Harvard)

Research Group
Pattern Recognition and Bioinformatics
DOI related publication
https://doi.org/10.1093/bioinformatics/btae757
More Info
expand_more
Publication Year
2025
Language
English
Related content
Research Group
Pattern Recognition and Bioinformatics
Issue number
1
Volume number
41
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Motivation
Partial order alignment is a widely used method for computing multiple sequence alignments, with applications in genome assembly and pangenomics, among many others. Current algorithms to compute the optimal, gap-affine partial order alignment do not scale well to larger graphs and sequences. While heuristic approaches exist, they do not guarantee optimal alignment and sacrifice alignment accuracy.

Results
We present POASTA, a new optimal algorithm for partial order alignment that exploits long stretches of matching sequence between the graph and a query. We benchmarked POASTA against the state-of-the-art on several diverse bacterial gene datasets and demonstrated an average speed-up of 4.1x and up to 9.8x, using less memory. POASTA’s memory scaling characteristics enabled the construction of much larger POA graphs than previously possible, as demonstrated by megabase-length alignments of 342 Mycobacterium tuberculosis sequences.