Accelerating Protein Sequence Alignment with Different Parallel Hardware Platforms

More Info
expand_more

Abstract

Detecting similarities between (RNA, DNA, and protein) sequences is an important part of bioinformatics. Among the algorithms used to accomplish this, the Smith-Waterman algorithm is very popular. A sequential implementation of Smith-Waterman requires quadratic running time with respect to the length of the sequences. As the amount of data in this field is continuously increasing, quick analysis through a sequential implementation is no longer feasible. One way to reduce the running time is by using parallelism and parallel platforms. There is a great diversity of hardware platforms that enable parallelism in different ways, each favoring different types of computations. The subject of this thesis is to understand the performance of the state-of-the-art parallel implementation of the Smith-Waterman algorithm, PaSWAS, on different parallel hardware platforms. PaSWAS has been designed and implemented using CUDA, the proprietary framework from NVIDIA. This choice limits the PaSWAS functionality to NVIDIA GPUs. By using OpenCL, a platform independent, standard programming model for many-cores, we enable PaSWAS to run in parallel on other hardware platforms. We show that, for NVIDIA GPUs, the portability enabled by OpenCL comes at the expense of performance. We further define a set of platform-specific parameters that have a high performance impact for the OpenCL implementation, and demonstrate empirically that their values are different for different hardware platforms. We also demonstrate that proper partitioning of the sequences can increase parallelism, which leads to better performance. Lastly, we create a performance estimator which is able to predict the execution time of the PaSWAS algorithm on different hardware platforms for given dataset. This enables us to determine a-priori which hardware platform to use for a given dataset. We conclude that PaSWAS is a very effective parallel implementation of the Smith-Waterman algorithm, which delivers excellent results for GPUs (in both CUDA and OpenCL), and can be quite effective on CPUs, too. The performance vs. portability tradeoff of OpenCL is relevant for PaSWAS, and it is ultimately the choice of the end user which of the two is more relevant.