Q. Wang | TU Delft Repository

ScreenSense: Utilizing Communication Signals for Dynamic Finger Tracking for On-Screen Antennas

Bachelor thesis (2026) - S.P. Gupta, Qing Wang, Shun Zhuge, M.A. Neerincx

Future 6G smartphones are proposed to embed transparent on-screen antenna arrays that use communication signals for passive finger tracking. Our research proposes two novel localisation methods that exploit the finger's electromagnetic backscattering response. Using model-generated time-series data, we simulate the spatiotemporal backscattering of a finger hovering above a transparent planar array at sub-terahertz frequencies. We compare a classical matched filter and subspace methods against our proposed approaches: a CNN-adapted matched filter (MF-CNN) and a multi-tone CNN position regressor (MT-CNN), alongside a near-field subspace baseline. The learned methods achieve sub-millimeter accuracy and remain robust\newline across variations in signal-to-noise ratio, array size, dielectric properties, and hover height, with MT-CNN offering the best trade-off between accuracy and latency. ...

Training Strategies for Binary/Ternary Neural Networks

Bachelor thesis (2026) - R.B. Kiemes, Q. Wang, B. Refalo, I.M. Olkhovskaia

Binary and ternary neural networks offer substantial reductions in memory and computational cost, making them attractive for deployment on resource-constrained devices. Training these networks remains challenging because quantization functions are non-differentiable, requiring gradient approximations such as the Straight-Through Estimator (STE).

This work presents a systematic ablation study of the effects of different training configurations on ResNet-20 on CIFAR-10. We evaluated eleven STE variants and independently examined the effects of weight clipping and batch normalization. All ternary variants perform within 0.73 percentage points of the 91.61% full-precision baseline, with the polynomial STE achieving the best result of 91.23%. For binary, all variants reach 1.66 percentage points below the baseline, with tanh STE being the highest performer (90.35%). We find that the choice of STE has only a minor impact on final accuracy; however, STEs differ in training stability, with smoother estimators providing more consistent convergence.

Batch normalization had the greatest effect on performance; removing it reduced accuracy by up to 8.66 percentage points. Weight clipping yielded a smaller but consistent benefit, with an optimal clipping factor of f = 4.0, improving accuracy by 0.26 and 0.5 percentage points, respectively. Combining these findings, we identified effective training configurations for both ternary and binary networks: the optimal ternary setup (Using Trained Ternary Quantization) achieved 91.52% accuracy on ResNet-20/CIFAR-10, while the optimal binary configuration (Using XNOR-Net quantization) reached 90.78% accuracy, an improvement over prior baselines in both cases.

...

Adapting Mamba Models for Deployment on Microcontrollers

Enabling Linear-Time Sequence Modeling on Ultra-Low-Power Tiny Devices

Bachelor thesis (2026) - B. Drabiński, Q. Wang, B. Yang, M.A. Neerincx

As machine learning expands into diverse domains, TinyML has emerged as a crucial paradigm for deploying models on highly resource-constrained microcontrollers, which typically feature less than 256~KB of RAM. However, executing complex mathematical operations on these devices remains a significant challenge, necessitating novel model designs and hardware-aware optimization.
The Mamba architecture, built around State-Space Model, is a promising candidate due to its compact parameterization and strong performance on long-context tasks. Nevertheless, Mamba was originally designed for highly parallelized GPUs, making its adaptation for TinyML non-trivial. This paper evaluates Mamba deployment strategies on microcontrollers using TensorFlow Lite Micro.
We propose architecture modifications and optimization techniques tailored specifically to microcontroller constraints. Our deployment of a quantized Mamba model achieves a 60.4~KB peak RAM footprint on a Keyword Spotting task, a 74\% memory reduction compared to state-of-the-art work (MambaLite-Micro). Furthermore, we analyze the trade-offs of quantization, demonstrating that while it substantially reduces memory, it can introduce latency overhead on hardware lacking acceleration of INT8 operations.
To mitigate code size and loop-unrolling overheads, we introduce a model-splitting technique that enables the execution of larger models. Our findings demonstrate that while Mamba is a viable architecture for TinyML, further research is required to fully optimize State Space Model implementations for edge hardware. ...

Through-Screen Finger Localization and Tracking using Reflected Light

Bachelor thesis (2026) - A. Croitoru, Qing Wang, Braden Refalo, I.M. Olkhovskaia

Visible light positioning systems conventionally fix the light sources on a ceiling and let a receiver move through the scene. We invert this geometry by tracking a hovering finger above a transparent segmented OLED display placed above four photodiodes. From the four signals influenced by the reflected light from the finger we track its position. The motivating application is pre-touch sensing on mobile devices, where anticipating the user's next touch during the hover to touch window lets the system pre-load content. The central question is whether four under-screen photodiodes can localize and track a hovering finger in real time using only the microcontroller already driving the screen, avoiding the deep neural networks that prior through-screen sensing required. We collected a reflected light dataset of 199 finger captures across a 10×4 calibration grid and evaluated localization on a 5×2 cell grid. After subtracting a temporally interpolated no finger baseline, we build an 18-dimensional feature vector and classify the cell with a two-stage logistic-regression head that predicts column and row independently. This reaches 77.2% cell accuracy under a random split and 66.8% under a leave-one-calibration-dot-out protocol. The second one is more representative of deployment because every recording of the tested position is withheld from training. The complete pipeline runs on an Arduino Due that also drives the screen with sub-millisecond inference. We conclude that through-screen reflected light carries enough spatial information for cell-level finger localization without deep learning, on the same embedded hardware that runs the display. ...

Efficient Embedded Intelligence

Exploring the Width-Precision Trade-Off in Binary-Quantized Vision Transformers

Bachelor thesis (2026) - I.S. van Loon, B. Refalo, Q. Wang, I.M. Olkhovskaia

Vision Transformers perform strongly across computer vision tasks but often require too much compute and memory for embedded deployment. Binary quantization cuts these costs by constraining weights and activations to a single bit, at the expense of accuracy. We investigate whether the budget freed by binarization can be reinvested into additional model width to recover that lost accuracy. Using the BHViT-Tiny architecture on the Oxford-IIIT Pet dataset, we first isolate the accuracy gap caused by quantization alone by comparing a full-precision reference against its binarized counterpart at identical width, and then scale width within the freed budget to measure how much of this gap can be recovered by width. We find that binarization at the base width costs 7.1 points of Top-1 accuracy, and that tripling the width recovers 4.9 of these points while remaining at a theoretical 3.5× and 6.7× reduction in memory and compute relative to the full-precision reference. The wider binary model thus approaches full-precision accuracy at a fraction of its cost. Additionally, keeping the downsampling layers in full precision recovers a further 1.1 points at a cost still well within budget, narrowing the gap to 1.1 points and indicating that part of the residual loss stems from a precision bottleneck rather than from a global lack of capacity. Our results establish width scaling as an effective strategy for reducing the binarization accuracy gap, offering a promising path toward the resource-constrained deployment of Vision Transformers. ...

Multi-Object State Estimation using Probabilistic Belief-Based Trackers

Connecting Low-Frequency Detection and High-Rate Prediction on Embedded Devices

Bachelor thesis (2026) - V. Mashkov, N. Kumar, Q. Wang, B. Refalo

Agents require object-centric world models to enable Active Inference, where decisions minimize `surprise`. Maintaining high-frequency state estimation on edge hardware presents a dilemma between detection accuracy and update frequency. Traditional tracking frameworks are designed for high-frequency data and fail to bridge the large spatial uncertainty gaps that accumulate during low-frequency detection. We propose a Probabilistic Belief Tracker that decouples high-frequency belief propagation from low-frequency perception. The system utilizes a Gaussian Sum Filter with Interacting Multiple Model inspired dynamics to maintain competing motion hypotheses, representing multi-modal spatial uncertainty during detection gaps. Our results demonstrate that switching to these probabilistic beliefs provides the high-frequency continuity and identity stability required by a reliable world model. Deployment on the NVIDIA Jetson Nano confirms the architecture is viable for real-time edge deployment, while MOT17 benchmarks show that using 6x fewer detections (5~FPS) drops tracking accuracy by 10.7% and identity stability by 9.2%, relative to the 30~FPS baseline. Limiting identity switches to 172 at a 5~FPS detection rate confirms that probabilistic continuity preserves identity stability, despite the reduction in overall tracking accuracy typical of sparse detections. ...

Transformer Inference using MAD vs LUT Kernels

A Comparative Benchmark of MAD and LUT Kernels for Binary and Ternary Dot Products on CPU and Edge Platforms

Bachelor thesis (2026) - M.B. Eren, B. Refalo, Q. Wang, I.M. Olkhovskaia

Quantizing Transformer weights to binary or ternary values reduces the inner product to sign manipulation and zero masking, prompting two competing CPU kernel strategies: multiply-add (MAD) and table lookup (LUT). Prior work reports end-to-end speedups but confounds the comparison across data layout, quantization format, and table depth simultaneously.

This thesis isolates the trade-off by sweeping LUT depth as a single controlled variable, spanning matrix sizes across the cache hierarchy and attributing results through roofline analysis on an x86 platform with AVX2 and roofline plus hardware-counter analysis on an ARM edge platform with NEON. The LUT advantage proves conditional: binary throughput rises monotonically with depth to 104.4 GOPS, roughly 2.6 times the strongest MAD baseline, while ternary gains are narrower and erode once the table outgrows fast cache or forces a gather. Throughout, instruction throughput, not bandwidth, is the binding limit. ...

Structured Degradation in Visible Light Positioning

Modeling and Compensation of Long-Term Degradation in RSS-Based VLP System

Bachelor thesis (2026) - J. van Arkel, Q. Wang, S. Zhuge, Bo Yang, M.A. Neerincx

Visible Light Positioning (VLP) uses LEDs for accurate indoor localization. However, structured illumination drift caused by LED aging, optical contamination, thermal effects, blockages, and device failures can reduce the long-term accuracy of RSS-based VLP systems. This thesis investigates how this drift can be modeled and compensated for using lightweight algorithms suitable for microcontrollers.

The proposed method combines scaling-based compensation for gradual degradation with anomaly detection for sudden degradation events such as broken LEDs. This method is tested through a long-term deployment simulation using the DenseVLC dataset and is also implemented on a Raspberry Pi Pico to assess embedded feasibility. The results show that VLP systems suffer increasing errors over time, while degradation-aware compensation improves long-term robustness. However, embedded deployment introduces accuracy trade-offs due to quantization and memory constraints.

These results show that modeling and compensating for degradation mechanisms is important for reliable long-term VLP deployment, and that compensation methods need to account for both gradual and sudden changes in received signal strength. ...

Embedded Trustworthy AI for Healthcare

A Multi-Objective Study of Fairness, Privacy, and Efficiency under TinyML Constraints

Bachelor thesis (2026) - L. Tompea, Q. Wang, M.A. Neerincx

The growing deployment of AI-assisted diagnostics on resource-constrained microcontrollers raises an underexplored question: do the memory and latency limitations of embedded hardware reshape the fairness–accuracy–privacy trade-offs that practitioners must navigate in healthcare applications? We present a controlled, multi-objective empirical study evaluating Gaussian noise injection, post-training INT8 quantization, and classification threshold calibration. Fairness and privacy interventions are evaluated on the Pima Indians Diabetes dataset (768 samples, age-stratified protected group) using a lightweight MLP and a logistic regression baseline; quantization efficiency is additionally validated on a larger hospital readmission dataset (∼100,000 samples, ∼154,800-parameter model) to characterise scale-dependent compression behaviour. The key findings are fourfold: (i) INT8 quantization efficiency is scale-dependent: no benefit and up to 67% fairness degradation at sub-300 parameters, versus 3.87× compression and 3.6× speedup at ∼155k parameters; (ii) low-magnitude noise (σ=0.05) is a safe privacy proxy with negligible accuracy cost; (iii) higher noise levels create a non-monotonic privacy–fairness tension, destabilising group-level fairness without predictably improving it; (iv) post-hoc threshold calibration to τ =0.7 reduces equalized odds gap by 18.4% relative at only 1.2 pp accuracy cost, out-performing all training-time interventions with zero embedded overhead. These findings show that embedded constraints do not introduce new fairness–accuracy trade-offs but shift design priorities toward post-deployment calibration. ...

Exploring the feasibility of short-range VLC schemes in MIMO systems

Otsu Thresholding and Sliding Window Protocols

Bachelor thesis (2026) - Alexandru Lolea, Q. Wang, A. Kiste, M.A. Neerincx

With radio communication bandwidth becoming increasingly scarce and expensive, researchers have turned toward the light medium, namely the field of Visible Light Communication (VLC). Although the field of Visible Light Communication (VLC) was pioneered in the late 1800s, it faced criticism from scientists of that era, with radio communications being preferred instead. VLC has since regained attention by complementing existing radio communication methods. This research paper focuses on exploring different short-range multiple-input multiple-output (MIMO) screen-to-camera VLC schemes operating solely on the red optical channel. The transmitting screen is a 4×6 LED grid on a prototype board, while the receiver is an off-the-shelf smartphone back camera. The chosen modulation technique is on-off keying (OOK) with Manchester encoding (ME), while demodulation is performed using three different strategies, the first two using Otsu thresholding and the last using a sliding window approach. Our experiments show that, while the modulation scheme achieves a transmission rate of 6 symbols per LED per frame (up to 144 symbols per frame) and a bit error rate (BER) of less than 10⁻¹, the limited resolution and frame rate make it difficult to reliably include important data frame header fields such as the sequence number. ...

Embedded Spacecraft Fault Detection

A Hitchhiker's Guide to Explainable Thermal Anomaly Alerts for Downlink-Constrained Space Missions

Master thesis (2026) - A.J. Phillips, S. Speretta, Q. Wang, E. Mooij, A. Caon

Small satellites increasingly produce more housekeeping telemetry than can be continuously downlinked or inspected, delaying operator awareness of emerging spacecraft-health issues. This thesis develops an explainable on-board thermal anomaly-alerting pipeline for downlink-constrained small-spacecraft missions, using Delfi Twin as a case study. Rather than proposing a stand-alone anomaly-detection algorithm, it defines a deployment pathway linking telemetry scope, anomaly semantics, synthetic event-level evaluation, residual-to-alert decision logic, compact alert packets, and STM32L4-class embedded verification. A lightweight expected-temperature predictor is combined with residual scoring, cumulative evidence, persistence, hysteresis, transient-spike suppression, and explicit gap termination to form bounded detector events. A labelled synthetic benchmark enables quantitative evaluation, while FUNcube-1 telemetry provides qualitative stress evidence on real on-orbit data. Under matched-predictor conditions, all alert-worthy synthetic events were recovered, and STM32L4 replay demonstrated ample timing and memory margins. Flight performance and autonomous operational trust remain future validation tasks. ...

Fracture: Split Inference for Transformers on Embedded Devices

Master thesis (2025) - H.A. Bhatt, Q. Wang, G. Iosifidis

Efficient Eye Tracking Using Near-Eye Event Cameras: From Event-based Detection to Rapid Updates

Master thesis (2025) - J. Liu, G. Lan, Q. Wang, L. Du

Eye tracking is a cornerstone technology for next-generation human-computer interaction, particularly in Extended Reality (XR), and other healthcare applications. However, traditional frame-based eye tracking systems are constrained by latency, power consumption, and motion blur. Event cameras offer a promising alternative with their high temporal resolution, high dynamic range and low data redundancy, but existing event-based methods often struggle to balance tracking accuracy, computational efficiency, and robustness, especially on resource-constrained mobile hardware.

This thesis addresses these challenges by proposing a novel, purely event-based eye tracking pipeline designed for high-frequency performance and robust accuracy within a strict computational budget. The pipeline accepts only event streams and estimates the pupil region in the field of view. The core contribution is a dual-state framework that synergistically combines a deep learning-based pupil detector with a lightweight, rapid template updater. For robust detection, a lightweight, attention-augmented segmentation network, named PupilUNet, is developed. It leverages a truncated MobileNetV3 Small encoder and a parameter-free attention mechanism to accurately segment the pupil boundary from Speed-Invariant Time Surface (SITS) representations, which provide a stable input by normalizing for motion speed. To overcome the scarcity of annotated data, a comprehensive framework is introduced to augment a large-scale training dataset from limited initial labels. Once a high-confidence pupil template is detected, the system transitions to a rapid updating mode, employing an optimized, vectorized point-to-edge matching algorithm to track the pupil at
kilo-Hertz frequencies with millisecond latency. A dynamic control logic monitors tracking quality and seamlessly reverts to the robust detection mode when necessary, ensuring both speed and resilience.

Experimental results on the EV-Eye dataset validate the pipeline’s effectiveness. The PupilUNet detector achieves a P5 accuracy of 96.3% (pupil center error < 5 pixels), while the rapid updater operates with an average latency of approximately 1 ms. The lightweight PupilUNet model contains merely 0.177 M parameters and inferences within 0.553 GFLOPs. The fully integrated system sustains a P5 accuracy of 85.2% while achieving a peak tracking frequency of over 960 Hz. This work demonstrates a practical and efficient solution that successfully navigates the trade-offs between accuracy and latency, establishing a new baseline for high-performance, event-based eye tracking on mobile and embedded systems. ...

Eye tracking is a cornerstone technology for next-generation human-computer interaction, particularly in Extended Reality (XR), and other healthcare applications. However, traditional frame-based eye tracking systems are constrained by latency, power consumption, and motion blur. Event cameras offer a promising alternative with their high temporal resolution, high dynamic range and low data redundancy, but existing event-based methods often struggle to balance tracking accuracy, computational efficiency, and robustness, especially on resource-constrained mobile hardware.

This thesis addresses these challenges by proposing a novel, purely event-based eye tracking pipeline designed for high-frequency performance and robust accuracy within a strict computational budget. The pipeline accepts only event streams and estimates the pupil region in the field of view. The core contribution is a dual-state framework that synergistically combines a deep learning-based pupil detector with a lightweight, rapid template updater. For robust detection, a lightweight, attention-augmented segmentation network, named PupilUNet, is developed. It leverages a truncated MobileNetV3 Small encoder and a parameter-free attention mechanism to accurately segment the pupil boundary from Speed-Invariant Time Surface (SITS) representations, which provide a stable input by normalizing for motion speed. To overcome the scarcity of annotated data, a comprehensive framework is introduced to augment a large-scale training dataset from limited initial labels. Once a high-confidence pupil template is detected, the system transitions to a rapid updating mode, employing an optimized, vectorized point-to-edge matching algorithm to track the pupil at
kilo-Hertz frequencies with millisecond latency. A dynamic control logic monitors tracking quality and seamlessly reverts to the robust detection mode when necessary, ensuring both speed and resilience.

Experimental results on the EV-Eye dataset validate the pipeline’s effectiveness. The PupilUNet detector achieves a P5 accuracy of 96.3% (pupil center error < 5 pixels), while the rapid updater operates with an average latency of approximately 1 ms. The lightweight PupilUNet model contains merely 0.177 M parameters and inferences within 0.553 GFLOPs. The fully integrated system sustains a P5 accuracy of 85.2% while achieving a peak tracking frequency of over 960 Hz. This work demonstrates a practical and efficient solution that successfully navigates the trade-offs between accuracy and latency, establishing a new baseline for high-performance, event-based eye tracking on mobile and embedded systems.

Split Inference of Transformer

Master thesis (2025) - L. Hu, Q. Wang, J. Yang

With the increasing demand for artificial intelligence (AI), intelligent systems have become deeply integrated into various aspects of modern life, including autonomous driving, smart assistants on mobile devices, and powerful online language models such as ChatGPT. In addition, the emergence of generative models for text and image synthesis has significantly reduced the cost of ac cessing and interacting with information. However, the advancement of these AI applications comes at the cost of ever-growing computational and memory requirements. These requirements pose substantial challenges even for high end computing systems, and become prohibitive when deploying AI models on resource-constrained platforms such as embedded devices and Internet of Things (IoT) nodes. This thesis presents a distributed inference framework for Transformer mod els where we design a fine-grained, channel-wise parameter partitioning scheme. Importantly, the implementation of our framework is also independent of con ventional AI frameworks such as PyTorch, making it efficient, portable, and adaptable to virtually any compute-capable device. We begin by analyzing the computational and memory limitations of main stream hardware platforms, highlighting the motivation to aggregate multiple low-power devices to collectively execute AI workloads. Through software based simulation, we validate the correctness of the partitioned inference scheme and demonstrate that it introduces no functional deviation from unpartitioned single-device execution. The simulation also facilitates precise estimation of data flow and compute demand across multiple collaborating devices. Furthermore, we introduce a full-stack load balancing algorithm that enables adaptive task allocation based on heterogeneous hardware specifications, taking into account factors such as bandwidth, memory capacity, and communication latency. In summary, this thesis proposes a split, practical, and high-granularity Trans former inference framework that is compatible with heterogeneous hardware configurations, offering a promising step toward enabling AI inference distributively on a network of resource-constrained embedded platforms. ...

With the increasing demand for artificial intelligence (AI), intelligent systems have become deeply integrated into various aspects of modern life, including autonomous driving, smart assistants on mobile devices, and powerful online language models such as ChatGPT. In addition, the emergence of generative models for text and image synthesis has significantly reduced the cost of ac cessing and interacting with information. However, the advancement of these AI applications comes at the cost of ever-growing computational and memory requirements. These requirements pose substantial challenges even for high end computing systems, and become prohibitive when deploying AI models on resource-constrained platforms such as embedded devices and Internet of Things (IoT) nodes. This thesis presents a distributed inference framework for Transformer mod els where we design a fine-grained, channel-wise parameter partitioning scheme. Importantly, the implementation of our framework is also independent of con ventional AI frameworks such as PyTorch, making it efficient, portable, and adaptable to virtually any compute-capable device. We begin by analyzing the computational and memory limitations of main stream hardware platforms, highlighting the motivation to aggregate multiple low-power devices to collectively execute AI workloads. Through software based simulation, we validate the correctness of the partitioned inference scheme and demonstrate that it introduces no functional deviation from unpartitioned single-device execution. The simulation also facilitates precise estimation of data flow and compute demand across multiple collaborating devices. Furthermore, we introduce a full-stack load balancing algorithm that enables adaptive task allocation based on heterogeneous hardware specifications, taking into account factors such as bandwidth, memory capacity, and communication latency. In summary, this thesis proposes a split, practical, and high-granularity Trans former inference framework that is compatible with heterogeneous hardware configurations, offering a promising step toward enabling AI inference distributively on a network of resource-constrained embedded platforms.

BLE Relay Attack Mitigation Using Multi-Antenna Bluetooth 6.0 Channel Sounding

Master thesis (2025) - S. van de Water, Q. Wang

This thesis researches mitigations for BLE relay attacks. A design for a timebased distance bounding protocol using the Bluetooth channel sounding feature introduced in the new Bluetooth 6.0 core specification is presented. Bluetooth channel sounding is compromised of two distance measurement techniques: Phase-Based Ranging (PBR) and Round Trip Tim (RTT). The proposed protocol requires consistent channel sounding distance measurements in order to minimize the likelihood of succesfull relay attacks. Single-antenna channel sounding measurements have shown poor spatial and sequential consistency in a complex multipath office environment. In order to overcome inaccuracies that arise due to multipath propagation, this thesis investigates the optimal antenna configuration for Bluetooth channel sounding using multiple antennas. A comparison
between the root-mean-square error and maximum error of the single-antenna baseline and the proposed multi-antenna solution for both spatial and sequential consistency in a complex multipath office environment shows that there is, on average, a 58% reduction in error metrics when the optimal multi-antenna setup is used. The performance of the optimal multi-antenna channel sounding setup
in the complex environment approaches the single-antenna baseline performance
in an ideal outdoor environment. This shows that the added antenna diversity
successfully overcomes the negative effects due to multipath propagation. ...

TinyML-Empowered Indoor Positioning with Light

Model Optimization using Neural Architecture Search

Bachelor thesis (2025) - N. Lodha, Q. Wang, R. Zhu, R.R. Venkatesha Prasad

Visible light positioning (VLP) systems are a promising solution for indoor positioning, utilizing light-emitting diodes (LEDs) as transmitters and photodiodes (PDs) as receivers.
A received signal strength (RSS) based VLP system's accuracy is heavily dependent on the density of collected fingerprints, being a very labor-intensive process.
In this study, we focus on RSS fingerprints to achieve centimetre level positioning accuracy, while addressing the challenges of labor-intensive fingerprint collection and deployment on resource-constrained devices like the Raspberry Pi Pico microcontroller.
We found different neural network architectures using Neural Architecture Search (NAS) to optimize the VLP system, which achieve on average $12mm$ positioning error with low inference latency around $50ms$ on the Raspberry Pi Pico. ...

Real-Time Traffic Sign Recognition on Microcontrollers

Bachelor thesis (2025) - A.E. Celen, Q. Wang, R. Zhu, R.R. Venkatesha Prasad

Real-time traffic sign recognition on microcontrollers introduces challenges due to limited memory and processing capacity. This study investigates the trade-offs between model size, classification accuracy, and inference latency within hardware constraints. We present an efficient network architecture called AykoNet with two variants: AykoNet-Lite, prioritizing model size and inference latency, and AykoNet-Pro, prioritizing classification accuracy. We trained AykoNet on the German Traffic Sign Recognition Benchmark (GTSRB) and specifically optimized it for deployment on the Raspberry Pi Pico microcontroller. AykoNet-Lite delivers 94.60% accuracy with only a 36.80KB model size and 55.34ms inference time, while AykoNet-Pro achieves 95.90% accuracy with an 80.18KB model size and 87.13ms inference time. Our approach demonstrates the effectiveness of domain-specific preprocessing and architectural design, class-aware data augmentation, and the strategic use of depthwise separable convolutions. These results validate the feasibility of real-time traffic sign recognition in resource-constrained embedded systems. Specifically, AykoNet-Lite strikes an optimal balance between model size, classification accuracy, and inference latency for practical deployment in autonomous navigation applications. ...

TinyML-Based Adaptive Speed Control for Car Robot

A Comparative Approach

Bachelor thesis (2025) - A.D. Petriceanu, Qing Wang, R.R. Venkatesha Prasad

This work investigates the feasibility of performing monocular depth estimation on highly resource-constrained hardware, specifically the Raspberry Pi Pico Zero microcontroller. In contrast to existing approaches that rely on large convolutional networks and high performance devices, this study explores a set of custom lightweight encoder-decoder architectures, including one inspired by L-ENet, L-EfficientUNet, μPyD-Net, and an LSTM-μPyD-Net combination, designed to operate within strict memory limits. These models were trained on a preprocessed KITTI dataset, with either LiDAR depth maps or SGM (Semi-Global Matching) dense depth maps, and evaluated in terms of accuracy, model size, and real-time inference performance. Results demonstrate that meaningful depth prediction is achievable on microcontrollers, paving the way for low-cost autonomous navigation systems and broader applications of TinyML in embedded robotics, with SGM proving to be the best preprocessing technique, and the LSTM-μPyD-Net having the best accuracy when trained on the full Train split of the KITTI dataset. ...

TinyML-Empowered Line Following for a Car Robot

Evaluating the Capabilities of Various Lane Detection Models on Microcontrollers

Bachelor thesis (2025) - A.J.A. Carton de Wiart, Q. Wang, R. Zhu, R.R. Venkatesha Prasad

This research explores the feasibility of implementing lane detection on lightweight microcontrollers using a combination of traditional image processing and compact machine learning methods. With the aim of enabling real-time inference under strict hardware constraints, several models were trained and evaluated against a custom image processing pipeline. Each approach was tested for accuracy, speed, and resource usage on the Raspberry Pi Pico 0 microcontroller. While these solutions fall short of cutting-edge accuracy and cannot process as much information as state of the art models, their low cost, minimal power consumption, and real-time performance highlight their potential. These findings suggest that lightweight lane detection is a viable direction for further research in embedded autonomous systems.
...

TinyML-Empowered Indoor Positioning with Light

A Study on the Impact of LED Aging and Failure

Bachelor thesis (2025) - J.W. Li, Q. Wang, R. Zhu, R.R. Venkatesha Prasad

Visible light positioning (VLP) enables accurate indoor localization by leveraging a dense deployment of LEDs in future lighting infrastructure, but its widespread adoption is hindered by two key challenges: the need for densely sampled fingerprint datasets and performance degradation due to LED aging or failure. In this work, we propose a VLP framework that reduces reliance on dense fingerprinting and remains robust over time without requiring manual re-fingerprinting. Using a dataset acquired from the DenseVLC testbed, we evaluate preprocessing techniques that enhance positioning accuracy under noisy received signal strength (RSS) measurements. To address long-term reliability, we introduce a simulation framework that models LED degradation and sudden failures. Most importantly, we present an online learning approach that dynamically adapts the positioning model in response to environmental and infrastructure changes.
In our simulations, this approach maintains the original level of accuracy despite aging effects. In some cases, it yields up to a 95% improvement when evaluated over longer timespans. Furthermore, our preprocessing contributions have led to a 30% improvement to baseline performance without aging. Our results demonstrate a path toward scalable, self-sustaining VLP systems suitable for real-world deployment. ...