Accurate Memory Profiling for Machine Learning Models on Microcontrollers

None, None

Accurate Memory Profiling for Machine Learning Models on Microcontrollers

Master Thesis (2025)

Author(s)

W. Liang (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Marco Zuñiga Zamalloa – Mentor (TU Delft - Networked Systems)

H. Liu – Mentor (TU Delft - Networked Systems)

Q. Wang – Graduation committee member (TU Delft - Embedded Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Embedded AI Embedded Deep Learning Peak Memory Usage

To reference this document use:

https://resolver.tudelft.nl/uuid:485b2785-a2ff-42e1-9a4a-eb31d47fdd33

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

07-07-2025

Awarding Institution

Delft University of Technology

Programme

['Electrical Engineering | Embedded Systems']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Microcontroller-based neural network inference faces significant RAM constraints, hindering performance and deployment. One of the main constraints is the peak memory usage, which is essential for conducting deep learning inferences with low latency. To address this, previous researchers have developed peak memory estimators to estimate the Peak Memory Usage, which could be used by inference-time optimization techniques like pruning to tackle the RAM constraint. But many of the peak memory estimators used by current state-of-the art frameworks like µNAS and TinyEngine produce underestimation or overestimation, reducing the reliability of model decisions made under RAM constraints. Underestimation often arises from failing to account for all components contributing to peak memory usage, while overestimation can occur when extra memory overheads irrelevant in MCU-specific inference scenarios are incorrectly included. In this paper, we propose our peak memory estimator, which estimates the peak memory usage of deep learning inference at the operator level and can accurately estimate the peak memory usage of a deep learning model during inference. The experiments show that our method achieves more accurate peak memory predictions across multiple MCU platforms and can be effectively integrated with pruning strategies to produce better model compression that satisfies both memory and accuracy constraints. Specifically, on a benchmark of 150 models, our method achieved an average estimation error margin of only 0.9%, significantly outperforming µNAS, which exhibited an average error margin of 61.7%.

Files

TUD_MSc_Thesis_Wenxuan_Liang_A... (pdf)

(pdf | 0 Mb)

License info not available

File under embargo until 15-07-2027