Near-Precise Parameter Approximation for Multiple Multiplications on A Single DSP Block

None, None; None, None

Near-Precise Parameter Approximation for Multiple Multiplications on A Single DSP Block

Journal Article (2022)

Author(s)

Ercan Kalali (TU Delft - Signal Processing Systems)

Rene Van Leuken (TU Delft - Signal Processing Systems)

Research Group

Signal Processing Systems

FPGA Approximate computing Multiple multiplications DSP blocks Systolic array

DOI related publication

https://doi.org/10.1109/TC.2021.3119187 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:9a38d662-3171-4242-8ce1-8c044242eee2

More Info

expand_more

Publication Year

2022

Language

English

Research Group

Signal Processing Systems

Issue number

9

Volume number

71

Article number

9566777

Pages (from-to)

2036-2047

Downloads counter

238

Collections

Institutional Repository

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

DSP blocks are one of the efficient solutions to implement multiply-accumulate (MAC) operations on FPGAs. However, since the DSP blocks have wide multiplier and adder blocks, MAC operations using low bit-length parameters lead to an underutilization. Hence, an efficient approximation technique is introduced. The technique includes manipulation and approximation of the low bit-length parameters based upon a Single DSP - Multiple Multiplication (SDMM) execution. The accuracy of the developed optimization technique was evaluated for different CNN weight bit precisions using the Alexnet and VGG-16 networks and the ImageNet ILSVRC-2012 dataset. The optimization can be implemented without loss of accuracy in almost all cases, while it causes slight accuracy losses in a few cases. Through these optimizations, multiple parameter multiplications are performed in a single DSP block at the cost of a small hardware overhead. As a result of our optimizations, the parameters are represented in a different format on off-chip memory, providing up to 33% compression without any hardware cost. A prototype systolic array architecture was implemented employing our optimizations on a Xilinx Zynq FPGA. It reduced the number of DSP blocks by 66.6%, 75%, and 83.3% for 8, 6, and 4-bit input variables, respectively.

Files

Near_Precise_Parameter_Approxi... (pdf)

(pdf | 1.11 Mb)

- Embargo expired in 01-07-2023

License info not available