Exploration of the AMD Ryzen NPU for Real-time Signal Processing

Real-time Imaging of LOFAR Station Data

Master Thesis (2025)
Author(s)

J.A. Fortanet Capetillo (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

H.P. Hofstee – Mentor (TU Delft - Computer Engineering)

Alle Jan van der Veen – Graduation committee member (TU Delft - Signal Processing Systems)

Steven van der Vlugt – Graduation committee member (Netherlands Institute for Radio Astronomy (ASTRON))

Mario Ruiz Noguera – Graduation committee member (AMD)

Zaid Al-Ars – Graduation committee member (Trinilytics)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
30-06-2025
Awarding Institution
Delft University of Technology
Programme
['Electrical Engineering | Embedded Systems']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The growing prevalence of Artificial Intelligence (AI) applications has led to the development of specialized hardware accelerators optimized for performance and energy efficiency. One such accelerator is the Ryzen Neural Processing Unit (NPU), integrated into AMD’s Ryzen AI processors. While primarily designed for AI workloads, this thesis investigates the potential of repurposing the Ryzen NPU for Digital Signal Processing (DSP) applications, with a focus on radio astronomy. Using the All-Sky Imaging Algorithm from the LOFAR telescope system as a case study, the research evaluates whether the NPU can meet the real-time data processing demands imposed by LOFAR's 10 Hz data generation rate.

Four implementations of the algorithm were developed: three using the MLIR-AIE toolchain and one using the TINA framework. These implementations explored various parallelization and pipelining strategies to optimize performance while ensuring correctness and minimal power consumption. Experimental evaluations revealed up to a 77.4× speedup over a CPU baseline and a 2.84× speedup over a GPU implementation. Notably, three of the four implementations met the 10 Hz real-time requirement. All implementations yielded accurate results, with only minor variations due to differences in data types.

Although power consumption data for the NPU implementations was unavailable, the performance gains underscore the Ryzen NPU's potential for non-AI workloads. This thesis provides a proof of concept for DSP acceleration on the Ryzen NPU, contributes a new layer to the TINA toolchain, and offers insights for future application development.

Files

TUD_thesis.pdf
(pdf | 4.78 Mb)
License info not available