Accurate Memory Profiling for Machine Learning Models on Microcontrollers

Master Thesis (2025)
Author(s)

W. Liang (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Marco Zuñiga Zamalloa – Mentor (TU Delft - Networked Systems)

H. Liu – Mentor (TU Delft - Networked Systems)

Q. Wang – Graduation committee member (TU Delft - Embedded Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
07-07-2025
Awarding Institution
Delft University of Technology
Programme
['Electrical Engineering | Embedded Systems']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Microcontroller-based neural network inference faces significant RAM constraints, hindering performance and deployment. One of the main constraints is the peak memory usage, which is essential for conducting deep learning inferences with low latency. To address this, previous researchers have developed peak memory estimators to estimate the Peak Memory Usage, which could be used by inference-time optimization techniques like pruning to tackle the RAM constraint. But many of the peak memory estimators used by current state-of-the art frameworks like µNAS and TinyEngine produce underestimation or overestimation, reducing the reliability of model decisions made under RAM constraints. Underestimation often arises from failing to account for all components contributing to peak memory usage, while overestimation can occur when extra memory overheads irrelevant in MCU-specific inference scenarios are incorrectly included. In this paper, we propose our peak memory estimator, which estimates the peak memory usage of deep learning inference at the operator level and can accurately estimate the peak memory usage of a deep learning model during inference. The experiments show that our method achieves more accurate peak memory predictions across multiple MCU platforms and can be effectively integrated with pruning strategies to produce better model compression that satisfies both memory and accuracy constraints. Specifically, on a benchmark of 150 models, our method achieved an average estimation error margin of only 0.9%, significantly outperforming µNAS, which exhibited an average error margin of 61.7%.

Files

License info not available
warning

File under embargo until 15-07-2027