A 22nm Low-Power Gated Recurrent Unit Accelerator for Digital Pre-Distortion of Wideband Power Amplifiers
H. Wu (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Chang Gao – Mentor (TU Delft - Electronics)
Leonardus Cornelis Nicolaas de Vreede – Graduation committee member (TU Delft - Electronics)
Rajendra Bishnoi – Graduation committee member (TU Delft - Computer Engineering)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
As communication capacity continues to expand, the application of deep neural networks (DNNs) for digital pre-distortion (DPD) has become increasingly prominent in addressing non-linearity issues in wideband power amplifiers (PAs). The advent of the fifth-generation (5G) era imposes higher requirements on DPD regarding frequency and latency. The integration of multiple-input multiple-output (MIMO) technology and micro base stations has driven the trend towards low-power, small-area DPD chips. This paper presents a high-performance, Gated Recurrent Unit (GRU)-based hardware architecture, characterized by high parallelism, and low resource consumption, enabling real-time signal processing by DPD. A novel method is proposed, employing quantization-aware training (QAT) with Hardsigmoid and Hardtanh functions to quantize the floating-point model in software. The optimized algorithm is implemented on hardware with inter-layer pipelining and retiming to optimize timing and increase clock frequency. Additionally, hardware-efficient linear functions, Hardsigmoid and Hardtanh, are utilized for activation functions to minimize hardware overhead. Experimental results demonstrate that hardware implementation achieves an Adjacent Channel Power Ratio (ACPR) of 49.48 dBc and an Error Vector Magnitude (EVM) of 46.05 dB, showin minimal degradation compared to the floating-point model (49.58 dBc/ 46.70 dB). Simulated under 22nm CMOS technology, the DPD chip, operating at 2GHz, occupied an area of 0.047 mm2 and is capable of handling signals with bandwidth up to 70 MHz. The highest throughput reaches 256.5 GOp/s while the power efficiency reaches 1.3154 TOp/s/W.
Files
File under embargo until 31-12-2025