Konstantina Koliogeorgi
Please Note
2 records found
1
GANDAFL
Dataflow Acceleration for Short Read Alignment on NGS Data
DNA read alignment is an integral part of genome study, which has been revolutionised thanks to the growth of Next Generation Sequencing (NGS) technologies. The inherent computational intensity of string matching algorithms such as Smith-Waterman (SmW) and the vast amount of NGS input data, create a bottleneck in the workflows. Accelerated reconfigurable computing has been extensively leveraged to alleviate this bottleneck, focusing on high-performance albeit standalone implementations. In existing accelerated solutions effective co-design of NGS short-read alignment still remains an open issue, mainly due to narrow view on real integration aspects, such as system wide communication and accelerator call overheads. In this paper, we first propose GANDAFL, a novel Genome AligNment DAta-FLow architecture for SmW Matrix-fill and Traceback stages to perform high throughput short-read alignment on NGS data. We then propose a radical software restructuring to widely-used Bowtie2 aligner that allows read alignment by batches to expose acceleration capabilities. Batch alignment minimizes calling overhead of the accelerators whereas moving both Matrix-fill and Traceback on chip extinguishes the communication data overheads. The standalone solution delivers up to ×116 and ×2 speedup over state-of-the-art software and hardware accelerators respectively and GANDAFL-enhanced Bowtie2 aligner delivers a ×1.9 speedup.
This paper presents the cloud infrastructure of the AEGLE project, that targets to integrate cloud technologies together with heterogeneous reconfigurable computing in large scale healthcare systems for Big Bio-Data analytics. AEGLEs engineering concept brings together the hot big-data engines with emerging acceleration technologies, putting the basis for personalized and integrated health-care services, while also promoting related research activities. We introduce the design of AEGLE's accelerated infrastructure along with the corresponding software and hardware acceleration stacks to support various big data analytics workloads showing that through effective resource containerization AEGLE's cloud infrastructure is able to support high heterogeneity regarding to storage types, execution engines, utilized tools and execution platforms. Special care is given to the integration of high performance accelerators within the overall software stack of AEGLE's infrastructure, which enable efficient execution of analytics, up to 140× according to our preliminary evaluations, over pure software executions.