Print Email Facebook Twitter Exploring the Limits of Query Pushdown for SQL Acceleration on FPGAs Title Exploring the Limits of Query Pushdown for SQL Acceleration on FPGAs Author Yönsel, Yüksel (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Al-Ars, Z. (mentor) Hauff, C. (graduation committee) Hoozemans, J.J. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Engineering Date 2021-09-21 Abstract There has been an increasing interest in moving computation closer to storage in recent years due to significant improvements in memory technology. FPGAs were proven to be an exciting candidate for accelerating database workloads since they provide an energy-efficient, reconfigurable and high-performance computation platform. Therefore, FPGAs are widely used as attached accelerators on data-centric applications.Database operations usually run on large volumes of data, which creates an I/O bottleneck when processing them on CPUs. Therefore, recently, researchers have been investigating query pushdown techniques during a database load operation. A well-known columnar storage format, Apache Parquet, provides an efficient way to store a database. In addition, current big data processing engines provide functionalities for pushing filter operation down to the parquet reading stage.This study explores the boundaries of pushing down analytic queries to the parquet reader stage by using FPGAs. An extended roofline analysis is performed on a proof-of-concept hardware design. The analysis shows that peak performance is achieved via a storage-attached accelerator once a high bandwidth interface is introduced. Furthermore, using multiple FPGAs with flash storage while interfacing them with OpenCAPI or PCI switch enables higher performance for aggregation since aggregation is shown to be I/O bound. The thesis introduces Apache Spark integration of the proof-of-concept query pushdown for parquet reading operations. Apache Spark implements several layers of parallelism to achieve higher speed-ups. However, the concurrency and parallelism for a single FPGA instance for multi-threaded Apache Spark applications requires synchronization on a constrained resource represented by a single FPGA. Therefore, this work suggests a way to achieve synchronization with a single FPGA instance. The present work shows that for a single Spark thread, a maximum end-to-end application speed-up of 3.88x and a kernel speed-up of 7.24x are achieved. As a result, the throughput of TPC-H Query 6 can be increased up to 3.8 GB/s. Furthermore, FPGA can perform better than CPU until Spark is configured to run on 7 CPU threads. Then, for the scaled-up multi-threaded Spark application with six CPU threads, the FPGA can achieve 1.13x end-to-end application speed-up and a kernel speed-up of 13.19x. Subject FPGAHeterogeneous ComputingHardware AcceleratorApache SparkFletcherPredicate Pushdown To reference this document use: http://resolver.tudelft.nl/uuid:85d80b28-f1ed-4e52-b233-1c20a7ba376b Part of collection Student theses Document type master thesis Rights © 2021 Yüksel Yönsel Files PDF Yuksel_Yonsel_MSc_Thesis.pdf 2.86 MB Close viewer /islandora/object/uuid:85d80b28-f1ed-4e52-b233-1c20a7ba376b/datastream/OBJ/view