ApHMM

Accelerating Profile Hidden Markov Models for Fast and Energy-efficient Genome Analysis

Journal Article (2024)
Author(s)

Can Firtina (ETH Zürich)

Kamlesh Pillai (Intel Corporation)

Gurpreet S. Kalsi (Intel Corporation)

Bharathwaj Suresh (Intel Corporation)

Damla Senol Cali (Carnegie Mellon University)

Jeremie S. Kim (ETH Zürich)

Taha Shahroodi (TU Delft - Computer Engineering)

Meryem Banu Cavlak (ETH Zürich)

Joël Lindegger (ETH Zürich)

undefined More Authors (External organisation)

Research Group
Computer Engineering
DOI related publication
https://doi.org/10.1145/3632950
More Info
expand_more
Publication Year
2024
Language
English
Research Group
Computer Engineering
Issue number
1
Volume number
21
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures, where states and edges capture modifications (i.e., insertions, deletions, and substitutions) by assigning probabilities to them. These probabilities are subsequently used to compute the similarity score between a sequence and a pHMM graph. The Baum-Welch algorithm, a prevalent and highly accurate method, utilizes these probabilities to optimize and compute similarity scores. Accurate computation of these probabilities is essential for the correct identification of sequence similarities. However, the Baum-Welch algorithm is computationally intensive, and existing solutions offer either software-only or hardware-only approaches with fixed pHMM designs. When we analyze state-of-the-art works, we identify an urgent need for a flexible, high-performance, and energy-efficient hardware-software co-design to address the major inefficiencies in the Baum-Welch algorithm for pHMMs. We introduce ApHMM, the first flexible acceleration framework designed to significantly reduce both computational and energy overheads associated with the Baum-Welch algorithm for pHMMs. ApHMM employs hardware-software co-design to tackle the major inefficiencies in the Baum-Welch algorithm by (1) designing flexible hardware to accommodate various pHMM designs, (2) exploiting predictable data dependency patterns through on-chip memory with memoization techniques, (3) rapidly filtering out unnecessary computations using a hardware-based filter, and (4) minimizing redundant computations. ApHMM achieves substantial speedups of 15.55×–260.03×, 1.83×–5.34×, and 27.97× when compared to CPU, GPU, and FPGA implementations of the Baum-Welch algorithm, respectively. ApHMM outperforms state-of-the-art CPU implementations in three key bioinformatics applications: (1) error correction, (2) protein family search, and (3) multiple sequence alignment, by 1.29×–59.94×, 1.03×–1.75×, and 1.03×–1.95×, respectively, while improving their energy efficiency by 64.24×–115.46×, 1.75×, and 1.96×.