Integration of a convolutional neural network for speech-to-text recognition in an FPGA compiler flow

None, None

Integration of a convolutional neural network for speech-to-text recognition in an FPGA compiler flow

Master Thesis (2021)

Author(s)

M. Mrahorović (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Z Al-Ars – Mentor (TU Delft - Computer Engineering)

J. Petri-König – Graduation committee member (TU Delft - Computer Engineering)

Sicco Verwer – Graduation committee member (TU Delft - Cyber Security)

Yaman Umuroglu – Graduation committee member (Xilinx)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

CNN Inference FPGA Compiler Hardware acceleration FINN Speech-to-text Vivado HLS

To reference this document use:

https://resolver.tudelft.nl/uuid:b6c889b1-e06f-447c-af69-55708555bf90

More Info

expand_more

Publication Year

2021

Language

English

Copyright

Graduation Date

24-09-2021

Awarding Institution

Delft University of Technology

Programme

Computer Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Deep Neural Network (DNNs) have increased significantly in size over the past decade. Partly due to this, the accuracy of DNNs in image classification and speech recognition tasks has increased as well. This enables a great potential for such models to be applied in real-world applications. However, due to their size, the compute and power requirements are often too large to deploy these models on edge devices. This prohibits applying such models within a rich field of application demanding high-throughput and real-time execution.

Deploying quantized DNNs on Field Programmable Gate Arrays (FPGAs) overcomes this problem. FPGAs are well known for their low-latency, high-throughput, and low-energy capabilities. However, creating hand-tuned FPGA designs requires expert-level knowledge of the underlying hardware domain. Especially for mathematicians or software engineers that develop new quantized DNNs, but also for experienced hardware designers that want to implement a large DNN on an FPGA, the implementation burden is often too large to reap any practical benefits from accelerating the application on an FPGA.

The open-source FINN compiler, introduced by Xilinx Research Lab, provides an excellent bridge between the software and hardware domain by allowing quantized DNN inference FPGA accelerators to be generated from a high-level description of the quantized DNN in the widely adopted open-source ONNX format. Due to lower-level implementation details being abstracted away, the question is how this affects the performance of the generated accelerator.

This work examines whether FPGA implementations of CNN-based models for speech-to-text inference can be generated automatically by means of FINN. For this purpose, a sub state-of-the-art CNN for speech-to-text recognition, named QuartzNet, is targeted for FPGA acceleration.
To achieve this, extensions to the FINN compiler are proposed to enable generating 1D CNN inference accelerators for FPGAs. Furthermore, a proof-of-concept FPGA accelerator of a quantized QuartzNet model is implemented by means of FINN. Compared to a high-end CPU device, the proposed FPGA accelerator achieves 7.7x higher throughput and 8.2x lower latency for a speech recognition inference task. Compared to a high-end GPU device, the proposed FPGA accelerator improves the energy efficiency by 6.8% at the expense of lower throughput and higher latency.

By generating an FPGA accelerator for a quantized version of the QuartzNet model, this work bridges the software and hardware domain by showcasing how a trained CNN in the software domain can be transformed to create a high-throughput, low-latency, and energy-efficient FPGA accelerator with a fraction of the design effort required compared to constructing a handwritten RTL implementation.

Files

Mirza_mrahorovic_msc_thesis.pd... (pdf)

(pdf | 3.97 Mb)

License info not available