Genetic sequence alignment on a supercomputing platform

More Info
expand_more

Abstract

Genetic sequence alignment is an important tool for researchers. It lets them see the differences and similarities between two genetic sequences. This is used in several fields, like homology research, auto immune disease research and protein shape estimation. There are various algorithms that can perform this task and several hardware platforms suitable to deliver the necessary computation power. Given the large volume of the datasets used, throughput is nowadays the major bottleneck in sequence alignment. In this thesis we discuss some of the existing solutions for high throughput genetic sequence alignment and present a new one. Our solution implements the well known Smith-Waterman optimal local alignment algorithm on the HC-1 hybrid supercomputer from Convey Computer. This platform features four FPGAs which can be used to accelerate the problem in question. The FPGAs, and the CPU that controls them, live in the same virtual memory space and share one large memory. We developed a hardware description for the FPGAs and a software program for the CPU. Some focus points were: a sustainable peak performance, being able to align sequences of any length, FPGA area efficient computations and the cancellation of unnecessary workload. The result is a Smith-Waterman FPGA core that can run at 100\% utilization for many alignments long. They are packed per six on a FPGA running on 150 MHz, which results in a full system performance of 460 GCUPS (billion elementary operations per second). Our elementary processing element can deliver double the work per clock cycle than a naive implementation, resulting in a better throughput per area ratio. At a system level a notable amount of workload is cancelled. It is the most flexible implementation we are aware of . We re-evaluate the use of FPGAs for accelerating Smith-Waterman and conclude that they will continue to be a good choice per dollar and per watt, as long as we narrow the problem space.

Files