Acceleration of Seed Extension for BWA-MEM DNA Alignment Using GPUs

More Info
expand_more

Abstract

DNA aligning is a compute-intensive and time-consuming task required for all further DNA processing. It consists in finding for each DNA string from a sample its location in a reference genome. Usual techniques for short reads (hundreds of bases) involve seed-extension, where a small matching seed is found with quick search through FM-index and then extended on both ends with a custom Smith-Waterman algorithm, giving optimal solution. However this seed-extension takes a tremendous amount of time. This is why we present in this thesis a solution to offload extension on a GPU to be done in a parallel fashion. This is possible thanks to the fact that the DNA reads do not present any kind of relation between each other. We used the Burrows-Wheeler Aligner - Maximal Exact Match (BWA-MEM), a reference program in the field, to which we integrated a GPU-accelerated library for extension, GASAL2. However, BWA-MEM has a left-right dependency on extension starting scores, with the left alignment starting with the seed score, then the right part starting with the previously calculated left score. We designed a solution by starting both extensions with the seed score, we called this method the "seed-only" paradigm. On our test machine featuring 12 hyperthreaded cores and an NVIDIA Tesla K40c, when running with 12 threads, we could observe a raw kernel speed-up of 4.8x ; but if we allow non-blocking calls to let the CPU run the seeding tasks while the GPU computes the extension, we can reach a 16x effective speed-up for the extension. This extension part takes around 27% of the total time but our acceleration introduces a small overhead due to memory preparations and copying, which makes the whole application 1.28x faster, getting close to the theoretical maximum of 1.37x. Additionally, the paradigm shift we operated creates minimal differences in the final main scores on good quality alignments, with a 1.82% difference for our 5.2 million sequences data set. This makes our acceleration with GASAL2 an solid and efficient solution for a single machine.