Adapting Mamba Models for Deployment on Microcontrollers

None, None

Adapting Mamba Models for Deployment on Microcontrollers

Enabling Linear-Time Sequence Modeling on Ultra-Low-Power Tiny Devices

Bachelor Thesis (2026)

Author(s)

B. Drabiński (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Q. Wang – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

B. Yang – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

M.A. Neerincx – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

TensorFlow Lite for Microcontrollers Machine Learning TinyML Microcontrollers

To reference this document use

https://resolver.tudelft.nl/uuid:1d2c2f73-f702-4d94-9af3-1dca52275665

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

26-06-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

5

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

As machine learning expands into diverse domains, TinyML has emerged as a crucial paradigm for deploying models on highly resource-constrained microcontrollers, which typically feature less than 256~KB of RAM. However, executing complex mathematical operations on these devices remains a significant challenge, necessitating novel model designs and hardware-aware optimization.
The Mamba architecture, built around State-Space Model, is a promising candidate due to its compact parameterization and strong performance on long-context tasks. Nevertheless, Mamba was originally designed for highly parallelized GPUs, making its adaptation for TinyML non-trivial. This paper evaluates Mamba deployment strategies on microcontrollers using TensorFlow Lite Micro.
We propose architecture modifications and optimization techniques tailored specifically to microcontroller constraints. Our deployment of a quantized Mamba model achieves a 60.4~KB peak RAM footprint on a Keyword Spotting task, a 74\% memory reduction compared to state-of-the-art work (MambaLite-Micro). Furthermore, we analyze the trade-offs of quantization, demonstrating that while it substantially reduces memory, it can introduce latency overhead on hardware lacking acceleration of INT8 operations.
To mitigate code size and loop-unrolling overheads, we introduce a model-splitting technique that enables the execution of larger models. Our findings demonstrate that while Mamba is a viable architecture for TinyML, further research is required to fully optimize State Space Model implementations for edge hardware.

Files

Paper.pdf

(pdf | 15.4 Mb)

License info not available