Fully Pipelined FPGA Acceleration of Binary Convolutional Neural Networks with Neural Architecture Search

None, None; None, None; None, None; None, None

Fully Pipelined FPGA Acceleration of Binary Convolutional Neural Networks with Neural Architecture Search

Journal Article (2024)

Author(s)

M. Ji (Jilin University, TU Delft - Computer Engineering)

Z Al-Ars (TU Delft - Computer Engineering)

Yuchun Chang (Dalian University of Technology)

Baolin Zhang (Jilin University)

Research Group

Computer Engineering

DOI related publication

https://doi.org/10.1142/S0218126624501706

High throughput OpenCAPI Low latency Flexible parallelism FPGA accelerator Fully pipelined

To reference this document use:

https://resolver.tudelft.nl/uuid:464c71b8-e2ab-4890-949a-a5b4dd495644

More Info

expand_more

Publication Year

2024

Language

English

Research Group

Computer Engineering

Bibliographical Note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. @en

Issue number

10

Volume number

33

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In this paper, we present a fully pipelined and semi-parallel channel convolutional neural network hardware accelerator structure. This structure can trade off the compute time and the hardware utilization, allowing the accelerator to be layer pipelined without the need for fully parallelizing the input and output channels. A parallel strategy is applied to reduce the time gap in transferring the output results between different layers. The parallelism can be decided based on the hardware resources on the target FPGA. We use this structure to implement a binary ResNet18 based on the neural architecture search strategy, which can increase the accuracy of manually designed binary convolutional neural networks. Our optimized binary ResNet18 can achieve a Top-1 accuracy of 60.5% on the ImageNet dataset. We deploy this ResNet18 hardware implementation on an Alphadata 9H7 FPGA, connected with an OpenCAPI interface, to demonstrate the hardware capabilities. Depending on the amount of parallelism used, the latency can range from 1.12 to 6.33ms, with a corresponding throughput of 4.56 to 0.71 TOPS for different hardware utilization, with a 200MHz clock frequency. Our best latency is 8× lower and our best throughput is 1.9× higher compared to the best previous works. The code for our implementation is open-source and publicly available on GitHub at https://github.com/MFJI/NASBRESNET.

Files

229978718.pdf

(pdf | 10.2 Mb)

- Embargo expired in 07-03-2025

License info not available