ApHMM

None, None; None, None; None, None; None, None; None, None; None, None; None, None; None, None; None, None; None, None

ApHMM

Accelerating Profile Hidden Markov Models for Fast and Energy-efficient Genome Analysis

Journal Article (2024)

Author(s)

Can Firtina (ETH Zürich)

Kamlesh Pillai (Intel Corporation)

Gurpreet S. Kalsi (Intel Corporation)

Bharathwaj Suresh (Intel Corporation)

Damla Senol Cali (Carnegie Mellon University)

Jeremie S. Kim (ETH Zürich)

Taha Shahroodi (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Meryem Banu Cavlak (ETH Zürich)

Joël Lindegger (ETH Zürich)

undefined More Authors (External organisation)

Research Group

Computer Engineering

Bioinformatics Genomics Profile hidden markov models The Baum-Welch Algorithm

DOI related publication

https://doi.org/10.1145/3632950 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:dd163f10-70c5-4094-8e88-3ea6b754651c

More Info

expand_more

Publication Year

2024

Language

English

Research Group

Computer Engineering

Journal title

ACM Transactions on Architecture and Code Optimization

Issue number

1

Volume number

21

Article number

19

Downloads counter

296

Collections

Institutional Repository

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures, where states and edges capture modifications (i.e., insertions, deletions, and substitutions) by assigning probabilities to them. These probabilities are subsequently used to compute the similarity score between a sequence and a pHMM graph. The Baum-Welch algorithm, a prevalent and highly accurate method, utilizes these probabilities to optimize and compute similarity scores. Accurate computation of these probabilities is essential for the correct identification of sequence similarities. However, the Baum-Welch algorithm is computationally intensive, and existing solutions offer either software-only or hardware-only approaches with fixed pHMM designs. When we analyze state-of-the-art works, we identify an urgent need for a flexible, high-performance, and energy-efficient hardware-software co-design to address the major inefficiencies in the Baum-Welch algorithm for pHMMs. We introduce ApHMM, the first flexible acceleration framework designed to significantly reduce both computational and energy overheads associated with the Baum-Welch algorithm for pHMMs. ApHMM employs hardware-software co-design to tackle the major inefficiencies in the Baum-Welch algorithm by (1) designing flexible hardware to accommodate various pHMM designs, (2) exploiting predictable data dependency patterns through on-chip memory with memoization techniques, (3) rapidly filtering out unnecessary computations using a hardware-based filter, and (4) minimizing redundant computations. ApHMM achieves substantial speedups of 15.55×–260.03×, 1.83×–5.34×, and 27.97× when compared to CPU, GPU, and FPGA implementations of the Baum-Welch algorithm, respectively. ApHMM outperforms state-of-the-art CPU implementations in three key bioinformatics applications: (1) error correction, (2) protein family search, and (3) multiple sequence alignment, by 1.29×–59.94×, 1.03×–1.75×, and 1.03×–1.95×, respectively, while improving their energy efficiency by 64.24×–115.46×, 1.75×, and 1.96×.

Files

3632950.pdf

(pdf | 2.13 Mb)