Dataflow Hardware Design for Big Data Acceleration Using Typed Interfaces

Master thesis (2020)

Authors

A. Hadnagy Electrical Engineering, Mathematics and Computer Science

Contributors

Z. Al-Ars Computer Engineering - (supervisor 1)

Faculty

Electrical Engineering, Mathematics and Computer Science

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:38d9b35e-2d75-4cab-b6d1-723c2849badb

Published Date

26-08-2020

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Recent trends in large-scale computing demonstrate continuous growth in the need for raw processing performance. At the same time, the slowdown of vertical scaling pushes the industry towards more energy-efficient heterogeneous architectures. With the appearance of FPGAs in the cloud and data centers, a new architecture is offered for offloading processing tasks and to bundle custom processing hardware with the applications. However, with great adaptability comes the increased complexity of development. The adoption of custom accelerators has been bounded by their limited programming models and the long turnaround time of development.

In this thesis, we look at current trends in the digital hardware design and synthesis to evaluate them in a big data context and identify the bottlenecks that limit productivity in the development and integration of domain-specific accelerators.

Based on the findings, we propose a composition language for components that implement typed interfaces to streamline kernel development. The language allows developers to compose accelerators from individual processing units that implement custom dataflow interfaces in a productive way. The productivity boost and utility of the language were evaluated on a practical use-case, showing almost two orders of magnitude reduction in code size. The performance of the proposed approach was benchmarked on a Power9 system with OpenCAPI, where our proof-of-concept accelerator kernel was able to achieve 4.04GB/s throughput using only 3.75% of the available FPGA resources. The integration of the accelerator led to a 13x speedup compared to a CPU-based Apache Spark implementation of the same algorithm.

Files

MSc_Thesis_Akos_Hadnagy.pdf

(.pdf | 1.25 Mb)