Fully Pipelined FPGA Acceleration of Binary Convolutional Neural Networks with Neural Architecture Search

Journal Article (2024)
Author(s)

M. Ji (Jilin University, TU Delft - Computer Engineering)

Z Al-Ars (TU Delft - Computer Engineering)

Yuchun Chang (Dalian University of Technology)

Baolin Zhang (Jilin University)

Research Group
Computer Engineering
DOI related publication
https://doi.org/10.1142/S0218126624501706
More Info
expand_more
Publication Year
2024
Language
English
Research Group
Computer Engineering
Bibliographical Note
Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. @en
Issue number
10
Volume number
33
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In this paper, we present a fully pipelined and semi-parallel channel convolutional neural network hardware accelerator structure. This structure can trade off the compute time and the hardware utilization, allowing the accelerator to be layer pipelined without the need for fully parallelizing the input and output channels. A parallel strategy is applied to reduce the time gap in transferring the output results between different layers. The parallelism can be decided based on the hardware resources on the target FPGA. We use this structure to implement a binary ResNet18 based on the neural architecture search strategy, which can increase the accuracy of manually designed binary convolutional neural networks. Our optimized binary ResNet18 can achieve a Top-1 accuracy of 60.5% on the ImageNet dataset. We deploy this ResNet18 hardware implementation on an Alphadata 9H7 FPGA, connected with an OpenCAPI interface, to demonstrate the hardware capabilities. Depending on the amount of parallelism used, the latency can range from 1.12 to 6.33ms, with a corresponding throughput of 4.56 to 0.71 TOPS for different hardware utilization, with a 200MHz clock frequency. Our best latency is 8× lower and our best throughput is 1.9× higher compared to the best previous works. The code for our implementation is open-source and publicly available on GitHub at https://github.com/MFJI/NASBRESNET.

Files

229978718.pdf
(pdf | 10.2 Mb)
- Embargo expired in 07-03-2025
License info not available