An Efficient High-Throughput LZ77-Based Decompressor in Reconfigurable Logic

Journal Article (2020)
Author(s)

Jian Fang (National Innovation Institute of Defense Technology, TU Delft - Computer Engineering)

Jianyu Chen (Student TU Delft)

Jinho Lee (Yonsei University)

Zaid Al-Ars (TU Delft - Computer Engineering)

H.P. Hofstee (TU Delft - Computer Engineering, IBM Austin)

Research Group
Computer Engineering
Copyright
© 2020 J. Fang, Jianyu Chen, Jinho Lee, Z. Al-Ars, H.P. Hofstee
DOI related publication
https://doi.org/10.1007/s11265-020-01547-w
More Info
expand_more
Publication Year
2020
Language
English
Copyright
© 2020 J. Fang, Jianyu Chen, Jinho Lee, Z. Al-Ars, H.P. Hofstee
Research Group
Computer Engineering
Issue number
9
Volume number
92
Pages (from-to)
931-947
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

To best leverage high-bandwidth storage and network technologies requires an improvement in the speed at which we can decompress data. We present a “refine and recycle” method applicable to LZ77-type decompressors that enables efficient high-bandwidth designs and present an implementation in reconfigurable logic. The method refines the write commands (for literal tokens) and read commands (for copy tokens) to a set of commands that target a single bank of block ram, and rather than performing all the dependency calculations saves logic by recycling (read) commands that return with an invalid result. A single “Snappy” decompressor implemented in reconfigurable logic leveraging this method is capable of processing multiple literal or copy tokens per cycle and achieves up to 7.2GB/s, which can keep pace with an NVMe device. The proposed method is about an order of magnitude faster and an order of magnitude more power efficient than a state-of-the-art single-core software implementation. The logic and block ram resources required by the decompressor are sufficiently low so that a set of these decompressors can be implemented on a single FPGA of reasonable size to keep up with the bandwidth provided by the most recent interface technologies.