Transparently Accelerating Spark SQL Code on Computing Hardware
More Info
expand_more
Abstract
Through new digital business models, the importance of big data analytics continuously grows. Initially, data analytics clusters were mainly bounded by the throughput of network links and the performance of I/O operations. With current hardware development, this has changed, and often the performance of CPUs and memory access became the new limiting factor. Heterogeneous computing systems, consisting of CPUs and other computing hardware, such as GPUs and FPGAs, try to overcome this by offloading the computational work to the best suitable hardware.
Accelerating the computation by offloading work to special computing hardware often requires specialized knowledge and extensive effort. In contrast, Apache Spark became one of the most used data analytics tools, among other reasons, because of its user-friendly API. Notably, the component Spark SQL allows defining declarative queries without having to write any code. The present work investigates to reduce this gap and elaborates on how Spark SQL's internal information can be used to offload computations without the user having to configure Spark further.
Thereby, the present work uses the Apache Arrow in-memory format to exchange data efficiently between different accelerators. It evaluates Spark SQL's extensibility for providing custom acceleration and its new columnar processing function, including the compatibility with the Apache Arrow format. Furthermore, the present work demonstrates the technical feasibility of such an acceleration by providing a Proof-of-Concept implementation, which integrates Spark with tools from the Arrow ecosystem, such as Gandiva and Fletcher. Gandiva uses modern CPUs' SIMD capabilities to accelerate computations, and Fletcher allows the execution of FPGA-accelerated computations. Finally, the present work demonstrates that already for simple computations integrating these accelerators led to significant performance improvements. With Gandiva the computation became 1.27 times faster and with Fletcher even up-to 13 times.