Efficient mmWave Point-Clouds for Embedded Devices
Evaluating Real-Time Performance of Embedded Millimeter-Wave Radar Pre-Processing Pipelines
A. Nicolaou (TU Delft - Electrical Engineering, Mathematics and Computer Science)
N. Rosi – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
M.A. Zuñiga Zamalloa – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
J.M. Weber – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Automatically tracking the positioning and alignment of human limbs, also known as Human Pose Estimation (HPE), was traditionally pioneered by camera-based systems like the Microsoft Kinect, and remains critical across domains from interactive gaming to healthcare patient monitoring. Millimeter-wave (mmWave) radar has emerged as a compelling alternative; by utilizing electromagnetic waves to detect points on the surface of objects, it offers a more cost-effective, privacy-preserving, and robust solution than traditional cameras. However, the spatial "point-clouds" generated by mmWave radars are particularly irregular, requiring pre-processing before they can be fed into deep learning models. While these pre-processing techniques are well-documented and can easily be implemented on in high-level environments like Python, adapting and optimizing these pipelines for low-power embedded devices remains an underexplored challenge. It is currently not clear whether point-cloud pre-processing can overcome the memory and computational restrictions of low-power devices.
This thesis profiles the memory footprint and latency of executing mmWave point-cloud pre-processing on micro-controllers, specifically an STM32 Cortex-M7 with 320 KB of SRAM with the goal of real-time performance by processing each data sample in under 100 ms.
We propose and evaluate seven pipeline variants, incorporating hardware-acceleration, lightweight alternative algorithms, pipeline restructuring to eliminate computational redundancies, and a single-pass iteration strategy to minimize cache misses. Experimental results demonstrate that structural optimization compresses peak memory consumption from 90 KB to 50 KB, successfully approaching the theoretical lower bound dictated by the output buffers. Our most highly optimized configuration achieves an exceptional average latency of 8.13 ms (with a worst-case peak of 12 ms), comfortably satisfying our real-time constraints.
Further analysis revealed that the average point count per frame is the primary driver of computational performance. Ultimately, this work validates that efficient, real-time end-to-end radar processing is entirely viable on highly resource-constrained micro-controllers.