CIM-architecture for acceleration of DNA pre-alignment filters

More Info
expand_more

Abstract

Due to recent developments in DNA sequencing technology, there is a growing abundance of available genomic data. To process this information for use in fields such as healthcare and forensics, raw sequencing data have to be processed using computationally intensive algorithms. Currently, one of the major bottlenecks in this processing pipeline is the alignment step, which makes use of dynamic-programming algorithms. To reduce computation times, numerous solutions have been proposed aimed at reducing the execution time of the alignment step. This is done either by accelerating alignment itself using hardware accelerators and heuristics or by reducing the amount of input data through the use of pre-alignment filters. The algorithms associated with the latter solution are less computationally intensive than DP-based alignment, which reduces the end-to-end alignment time.

Currently, pre-alignment filters are effective to the point where the alignment bottleneck is shifted to the filtering step. Therefore, the filters are accelerated on hardware solutions such as GPUs and FPGAs. While these solutions show orders of magnitude improvement in execution times, they are insufficient for removing the filtering bottleneck entirely. The performance of these hardware accelerators is limited by the rate at which data can be supplied. As a solution, we propose a CIM-based accelerator to reduce data-movement overheads between the host device and the accelerator. Additionally, this architecture makes use of emerging non-volatile memories to perform Boolean operations directly within its memory elements. In doing so, it can exploit parallelism in the algorithms to achieve higher throughput.

In this work, we explore commonly found operations in existing pre-alignment filters and devise ways to implement them on the CIM-architecture. The proposed architecture is flexible in supporting multiple pre-alignment filters and a wide range of input data. The functionality of the architecture is verified through simulation and its effectiveness is tested using real data sets.

Using this architecture, we can achieve improvement in end-to-end execution time over the state of the art ranging from 7.2x to 119.6x for the evaluated data sets, while also achieving a reduction of up to 59% and 79.7% in chip-area and power consumption, respectively.

Furthermore, the provided work offers a platform for the development of future pre-alignment filtering algorithms to further improve performance.