Dataflow Hardware Design for Big Data Acceleration Using Typed Interfaces

Master Thesis (2020)
Author(s)

A. Hadnagy (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Z. Al-Ars – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2020
Language
English
Graduation Date
26-08-2020
Awarding Institution
Delft University of Technology
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
308
Collections
thesis
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Recent trends in large-scale computing demonstrate continuous growth in the need for raw processing performance. At the same time, the slowdown of vertical scaling pushes the industry towards more energy-efficient heterogeneous architectures. With the appearance of FPGAs in the cloud and data centers, a new architecture is offered for offloading processing tasks and to bundle custom processing hardware with the applications. However, with great adaptability comes the increased complexity of development. The adoption of custom accelerators has been bounded by their limited programming models and the long turnaround time of development.

In this thesis, we look at current trends in the digital hardware design and synthesis to evaluate them in a big data context and identify the bottlenecks that limit productivity in the development and integration of domain-specific accelerators.

Based on the findings, we propose a composition language for components that implement typed interfaces to streamline kernel development. The language allows developers to compose accelerators from individual processing units that implement custom dataflow interfaces in a productive way. The productivity boost and utility of the language were evaluated on a practical use-case, showing almost two orders of magnitude reduction in code size. The performance of the proposed approach was benchmarked on a Power9 system with OpenCAPI, where our proof-of-concept accelerator kernel was able to achieve 4.04GB/s throughput using only 3.75% of the available FPGA resources. The integration of the accelerator led to a 13x speedup compared to a CPU-based Apache Spark implementation of the same algorithm.

Files

MSc_Thesis_Akos_Hadnagy.pdf
(pdf | 1.25 Mb)
License info not available