A Power-Efficient Parameter Quantization Technique for CNN Accelerators

None, None; None, None

A Power-Efficient Parameter Quantization Technique for CNN Accelerators

Conference Paper (2021)

Author(s)

Ercan Kalali (TU Delft - Signal Processing Systems)

Rene van Leuken (TU Delft - Signal Processing Systems)

Deep learning Low power ASIC Quantization Hardware implementation

DOI related publication

https://doi.org/10.1109/DSD53832.2021.00012 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:0b8e8305-07cd-4180-b8bf-058c34d24ca1

More Info

expand_more

Publication Year

2021

Language

English

Article number

9556397

Pages (from-to)

18-23

ISBN (print)

978-1-6654-2704-3

ISBN (electronic)

978-1-6654-2703-6

Event

Downloads counter

189

Abstract

Quantization techniques are widely used in CNN inference to reduce the cost of hardware at the expense of small accuracy losses. However, after the quantization, there is still a multiplication cost for the fixed-point quantized CNN weights. Therefore, a novel CNN quantization technique is introduced, which can be implemented without using any multiplier. We evaluated our quantization technique using VGG-16 and Alexnet networks, and the Tiny ImageNet dataset. The quantization technique causes 0.39% and 0.98% accuracy losses for the 8-bit CNN weights compared to floating-point implementations of VGG-16 and Alexnet, respectively. After, a fine-tuning method for our quantization is introduced, which further reduces the accuracy loss. The fine-tuning reduced the accuracy losses on 8-bit quantized VGG-16 and Alexnet to 0.24% and 0.39%, respectively. Two different processing element architectures, which do not include any multiplier hardware, are designed to perform multiply-accumulate (MAC) operations of CNN models quantized by our technique. Two different systolic array prototypes are designed employing the two PE architectures to compare with the traditional fixed-point MAC implementation. The systolic array architectures containing our processing element designs reduced the power consumption of the systolic array up to 14.2% and 21.6%.