Scaling up data analytics in Python using multiple FPGAs

Master thesis (2020)

Authors

S. Aggarwal Electrical Engineering, Mathematics and Computer Science

Contributors

Z. Al-Ars Computer Engineering - (supervisor 1)

Jan S. Rellermeyer Data-Intensive Systems - (supervisor 2)

H.P. Hofstee Computer Engineering - (supervisor 2)

J.J. Hoozemans Computer Engineering - (supervisor 2)

Faculty

Electrical Engineering, Mathematics and Computer Science

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:e54bbdca-3e9f-4c23-8c89-463751193061

Published Date

05-08-2020

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Big data applications are becoming more commonplace due to an abundance of digital data and increasingly powerful hardware. One of these classes of hardware devices are FPGAs, which are being used today in various ways such as data centers and embedded systems. High performance, power efficiency, and reprogrammability are the primary reasons behind their wide use. Another trend over the previous years has been to use distributed data processing frameworks such as Apache Spark to improve the performance of big data applications. Traditionally, such frameworks are deployed on commodity hardware to save costs. This approach is fairly popular, with organizations often having on-premise compute clusters or using a cloud provider to access a managed cluster. This project attempts to combine the above-mentioned worlds - FPGAs and dis- tributed data processing. We have designed an architecture that allows us to use FP- GAs as end-devices in a compute cluster to perform the actual computation instead of CPUs. This architecture is designed by composing together several open source technologies and allows us to interact with an FPGA cluster using Python. Using a high-level programming language such as Python makes this system easy to use for software developers and data scientists, and also abstracts away the internal commu- nication within the cluster. We have built prototypes based on this architecture for 3 hardware platforms (FPGA families) and 3 specific applications to demonstrate general applicability. We have observed noticeable performance gains in these applications by scaling up the FPGA cluster.

Files

Shashank_Aggarwal_MSc_Thesis_r... (.pdf)

(.pdf | 0.497 Mb)