FPGA Based Deep Learning Accelerator for RF Applications

A Design Framework

More Info
expand_more

Abstract

Recently, interest in the use of deep learning technology for RF applications has increased. However, many of these studies are focused on developing deep learning models for a particular RF application. Therefore this master thesis focuses on the implementation of these kinds of deep learning models by using FPGAs such that these deep learning models can be used in an FPGA-based Software Defined Radio.

In this master thesis, a custom FPGA accelerator is designed for CNN models using reusable and configurable building blocks. The accelerator employs a streaming architecture and is fully pipelined, such that it accepts new input data every clock cycle. A key design aspect is that all building blocks in the accelerator are designed to be able to work on a portion of its input data. The implication is that the building blocks can produce an output as soon as enough input data is available. As a result, the work that the building blocks
have to performis spread out over time and thememory required for storing data is also reduced. Moreover, the precision of the fixed point parameters and operations is configurable. Therefore there is no limitation of only specifically supporting binary or ternary operations.

This accelerator has been tested for the automatic modulation classification problem. The result is an accelerator that can process real-time data at 600MHz and consume fewer FPGA resources than other similar initiatives. In a direct comparison with hls4ml, the designed custom accelerator achieves 2.4 times higher throughput and 2.3 times lower latency for the identical CNN, while also achieving the same accuracy and significantly lower resource utilization. In addition, the custom accelerator is compared to a ternary neural
network FPGA accelerator formodulation classification as proposed by Tridgell et al. The custom accelerator uses 3.3 times fewer LUTs, 9 times fewer FFs, 4 times fewer DSPs, and uses no BRAM, while the accelerator proposed by Tridgell et al. uses 48.5% of the available BRAM in an RFSoC FPGA.

Files

MSc_Thesis_HansdenBoer.pdf
(pdf | 2.01 Mb)
License info not available