Microcontroller-based neural network inference faces significant RAM constraints, hindering performance and deployment. One of the main constraints is the peak memory usage, which is essential for conducting deep learning inferences with low latency. To address this, previous res
...
Microcontroller-based neural network inference faces significant RAM constraints, hindering performance and deployment. One of the main constraints is the peak memory usage, which is essential for conducting deep learning inferences with low latency. To address this, previous researchers have developed peak memory estimators to estimate the Peak Memory Usage, which could be used by inference-time optimization techniques like pruning to tackle the RAM constraint. But many of the peak memory estimators used by current state-of-the art frameworks like µNAS and TinyEngine produce underestimation or overestimation, reducing the reliability of model decisions made under RAM constraints. Underestimation often arises from failing to account for all components contributing to peak memory usage, while overestimation can occur when extra memory overheads irrelevant in MCU-specific inference scenarios are incorrectly included. In this paper, we propose our peak memory estimator, which estimates the peak memory usage of deep learning inference at the operator level and can accurately estimate the peak memory usage of a deep learning model during inference. The experiments show that our method achieves more accurate peak memory predictions across multiple MCU platforms and can be effectively integrated with pruning strategies to produce better model compression that satisfies both memory and accuracy constraints. Specifically, on a benchmark of 150 models, our method achieved an average estimation error margin of only 0.9%, significantly outperforming µNAS, which exhibited an average error margin of 61.7%.