Tens of gigabytes per second JSON-to-Arrow conversion with FPGA accelerators

Conference Paper (2021)
Author(s)

Johan Peltenburg (TU Delft - Computer Engineering)

A. Hadnagy (TU Delft - Computer Engineering)

M. Brobbel (Teratide)

Robert Morrow (Sigmax.ai Inc)

Zaid Al-Ars (TU Delft - Computer Engineering)

Research Group
Computer Engineering
Copyright
© 2021 J.W. Peltenburg, A. Hadnagy, M. Brobbel, Robert Morrow, Z. Al-Ars
DOI related publication
https://doi.org/10.1109/ICFPT52863.2021.9609833
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 J.W. Peltenburg, A. Hadnagy, M. Brobbel, Robert Morrow, Z. Al-Ars
Research Group
Computer Engineering
Pages (from-to)
1-9
ISBN (print)
978-1-6654-2011-2
ISBN (electronic)
978-1-6654-2010-5
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

JSON is a popular data interchange format for many web, cloud, and IoT systems due to its simplicity, human readability, and widespread support. However, applications must first parse and convert the data to a native in-memory format before being able to perform useful computations. Many big data applications with high performance requirements convert JSON data to Apache Arrow RecordBatches, the latter being a widely-used columnar in-memory format for large tabular data sets used in data analytics. In this paper, we analyze the performance characteristics of such applications and show that JSON parsing represents a bottleneck in the system. Various strategies are explored to speed up JSON parsing on CPU and GPU as much as possible. Due to performance limitation of the CPU and GPU implementations, we furthermore present an FPGA accelerated implementation. We explain how hardware components that can parse variable-sized and nested structures can be combined to produce JSON parsers for any type of JSON document. Several fully integrated FPGA-accelerated JSON parser implementations are presented using the Intel Arria 10 GX and Xilinx VU37P devices, and compared to the performance of their respective host systems; an Intel Xeon and an IBM POWER9 system. Result show the accelerators achieve an end-to-end throughput close to 7 GB/s with the Arria 10 GX using PCIe, and close to 20 GB/s with the VU37P using OpenCAPI 3. Depending on the complexity of the JSON data to parse, the bandwidth is limited by the host-to-accelerator interface or available FPGA resources. Overall, this provides a throughput increase of up to 6x, compared to the baseline application. Also, we observe a full system energy efficiency improvement of up to 59x more JSON data parsed per joule.

Files

License info not available