Tens of gigabytes per second JSON-to-Arrow conversion with FPGA accelerators

None, None; None, None; None, None; None, None; None, None

Tens of gigabytes per second JSON-to-Arrow conversion with FPGA accelerators

Conference Paper (2021)

Author(s)

Johan Peltenburg (TU Delft - Computer Engineering)

A. Hadnagy (TU Delft - Computer Engineering)

M. Brobbel (Teratide)

Robert Morrow (Sigmax.ai Inc)

Zaid Al-Ars (TU Delft - Computer Engineering)

Research Group

Computer Engineering

Copyright

DOI related publication

https://doi.org/10.1109/ICFPT52863.2021.9609833

FPGA Parsing Apache Arrow Accelerator JSON

To reference this document use:

https://resolver.tudelft.nl/uuid:dfb14c9e-46ec-47f3-864e-e1d6eacc81ba

More Info

expand_more

Publication Year

2021

Language

English

Copyright

Research Group

Computer Engineering

Pages (from-to)

1-9

ISBN (print)

978-1-6654-2011-2

ISBN (electronic)

978-1-6654-2010-5

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

JSON is a popular data interchange format for many web, cloud, and IoT systems due to its simplicity, human readability, and widespread support. However, applications must first parse and convert the data to a native in-memory format before being able to perform useful computations. Many big data applications with high performance requirements convert JSON data to Apache Arrow RecordBatches, the latter being a widely-used columnar in-memory format for large tabular data sets used in data analytics. In this paper, we analyze the performance characteristics of such applications and show that JSON parsing represents a bottleneck in the system. Various strategies are explored to speed up JSON parsing on CPU and GPU as much as possible. Due to performance limitation of the CPU and GPU implementations, we furthermore present an FPGA accelerated implementation. We explain how hardware components that can parse variable-sized and nested structures can be combined to produce JSON parsers for any type of JSON document. Several fully integrated FPGA-accelerated JSON parser implementations are presented using the Intel Arria 10 GX and Xilinx VU37P devices, and compared to the performance of their respective host systems; an Intel Xeon and an IBM POWER9 system. Result show the accelerators achieve an end-to-end throughput close to 7 GB/s with the Arria 10 GX using PCIe, and close to 20 GB/s with the VU37P using OpenCAPI 3. Depending on the complexity of the JSON data to parse, the bandwidth is limited by the host-to-accelerator interface or available FPGA resources. Overall, this provides a throughput increase of up to 6x, compared to the baseline application. Also, we observe a full system energy efficiency improvement of up to 59x more JSON data parsed per joule.

Files

JSON_parser_Camera_Ready_.pdf

(pdf | 0.887 Mb)

License info not available