Print Email Facebook Twitter Integration of a convolutional neural network for speech-to-text recognition in an FPGA compiler flow Title Integration of a convolutional neural network for speech-to-text recognition in an FPGA compiler flow Author Mrahorović, Mirza (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Al-Ars, Z. (mentor) Petri-König, J. (graduation committee) Verwer, S.E. (graduation committee) Umuroglu, Yaman (graduation committee) Degree granting institution Delft University of Technology Programme Computer Engineering Date 2021-09-24 Abstract Deep Neural Network (DNNs) have increased significantly in size over the past decade. Partly due to this, the accuracy of DNNs in image classification and speech recognition tasks has increased as well. This enables a great potential for such models to be applied in real-world applications. However, due to their size, the compute and power requirements are often too large to deploy these models on edge devices. This prohibits applying such models within a rich field of application demanding high-throughput and real-time execution. Deploying quantized DNNs on Field Programmable Gate Arrays (FPGAs) overcomes this problem. FPGAs are well known for their low-latency, high-throughput, and low-energy capabilities. However, creating hand-tuned FPGA designs requires expert-level knowledge of the underlying hardware domain. Especially for mathematicians or software engineers that develop new quantized DNNs, but also for experienced hardware designers that want to implement a large DNN on an FPGA, the implementation burden is often too large to reap any practical benefits from accelerating the application on an FPGA. The open-source FINN compiler, introduced by Xilinx Research Lab, provides an excellent bridge between the software and hardware domain by allowing quantized DNN inference FPGA accelerators to be generated from a high-level description of the quantized DNN in the widely adopted open-source ONNX format. Due to lower-level implementation details being abstracted away, the question is how this affects the performance of the generated accelerator.This work examines whether FPGA implementations of CNN-based models for speech-to-text inference can be generated automatically by means of FINN. For this purpose, a sub state-of-the-art CNN for speech-to-text recognition, named QuartzNet, is targeted for FPGA acceleration. To achieve this, extensions to the FINN compiler are proposed to enable generating 1D CNN inference accelerators for FPGAs. Furthermore, a proof-of-concept FPGA accelerator of a quantized QuartzNet model is implemented by means of FINN. Compared to a high-end CPU device, the proposed FPGA accelerator achieves 7.7x higher throughput and 8.2x lower latency for a speech recognition inference task. Compared to a high-end GPU device, the proposed FPGA accelerator improves the energy efficiency by 6.8% at the expense of lower throughput and higher latency. By generating an FPGA accelerator for a quantized version of the QuartzNet model, this work bridges the software and hardware domain by showcasing how a trained CNN in the software domain can be transformed to create a high-throughput, low-latency, and energy-efficient FPGA accelerator with a fraction of the design effort required compared to constructing a handwritten RTL implementation. Subject FPGAHardware accelerationCNNspeech-to-textInferencecompilerVivado HLSFINN To reference this document use: http://resolver.tudelft.nl/uuid:b6c889b1-e06f-447c-af69-55708555bf90 Part of collection Student theses Document type master thesis Rights © 2021 Mirza Mrahorović Files PDF mirza_mrahorovic_msc_thesis.pdf 3.97 MB Close viewer /islandora/object/uuid:b6c889b1-e06f-447c-af69-55708555bf90/datastream/OBJ/view