Implementation of Bio-Informatics Applications on Various GPU Platforms

More Info
expand_more

Abstract

As of 2012, the world creates 2.5 quintillion bytes of data every day. Much of this data generated is what we refer to as Big Data. To explore how Big Data can create potential value and show the technical challenges accompanied with Big Data applications, we choose an application from bio-informatics: the Smith Waterman genetic database alignment algorithm, which is used for finding optimal genetic sequence alignments. The continuous increase in the volume of data in genetic databases leads to the exponential increase in the time required for comparing these genetic sequences. This thesis investigates the acceleration and optimization of the Smith Waterman algorithm using GPU platforms. The thesis uses DOPA, an existing implementation, which was optimized for the GTX275 GPU platform from NVIDIA. DOPA resulted in a huge performance gain compared to other implementations running sequentially on CPU. Our thesis aims to study and improve the behavior of this implementation on different NVIDIA GPUs: the Tesla C2075 and the GeForce GT640. We improved the cores occupancy of DOPA on different GPU cards resulting in an efficient workload distribution, thereby improving the performance by about 17% to 61%. We achieved 25 GCUPS performance on the C2075 and 11 GCUPS on the GT640 compared to a straight forward DOPA port on the same cards achieving 21.9 and 6.8 GCUPS, respectievly. To achieve considerable performance for Big Data application for different platforms, two important factors have to be taken into account: increase the parallelism in the software and increase the utilization on the hardware side. We evaluated and presented other metrics such as the cost in terms of euro and watt to be considered along with GPU performance.