Circular Image

M.A. Zuñiga Zamalloa

info

Please Note

75 records found

Bachelor thesis (2026) - M. Šmitas, H. Liu, M.A. Zuñiga Zamalloa
Running deep learning models directly on microcontroller units (MCUs)—a field known as TinyML—enables artificial intelligence in energy-restricted applications that rely strictly on limited battery power or harvested local energy. Creating models for these constrained devices requires strict optimization via neural architecture search (NAS) frameworks such as µNAS. However, traditional proxies like multiply-accumulate (MAC) counts fail to serve as accurate energy predictors because they overlook complex hardware interactions and compiler-level runtime optimizations introduced by deployment engines such as TensorFlow Lite (TFLite) Micro.

In this paper, we explore accurate energy estimation within the µNAS framework. We build an automated hardware-in-the-loop (HIL) profiling pipeline to deploy diverse architectures on an MCU and record their physical power draw, generating a dataset of 671 unique models. Evaluating a baseline linear regression predictor applied to MAC counts achieves a high macro-level fit (R² = 0.985) but suffers from an unacceptable mean absolute percentage error (MAPE) of 85.4% due to structural oversights.

To address this limitation, we propose a novel energy estimator based on a directed acyclic graph neural network (DAGNN). By processing neural network topology directly, the DAGNN learns complex hardware interactions and runtime optimization behaviors. Our estimator substantially outperforms the baseline, reducing MAPE from 85.4% to 16.0%. ...

Implementation and Evaluation of Depthwise Convolution on Microcontrollers

Artificial Intelligence is increasingly being used in everyday devices. However, most AI systems are designed to run on powerful computers or cloud servers rather than on small, low-power devices such as microcontrollers. Running AI directly on these devices can reduce energy consumption and enable systems to operate without an internet connection. AIfES (Artificial Intelligence for Embedded Systems) is a machine learning framework that allows neural networks to be trained directly on microcontrollers. However, it currently lacks support for depthwise convolution, an important operation used in efficient neural network architectures such as MobileNet. As a result, many modern computer vision models cannot be trained within the framework.

This project extends AIfES with support for depthwise convolution and integrates the new operator into the existing training pipeline. The implementation was validated using a combination of manually verified test cases, comparisons with TensorFlow, and image classification experiments on embedded hardware. The results show that the new operator functions correctly during both inference and training. Models containing the implemented layer successfully learned classification tasks and achieved behavior similar to equivalent TensorFlow models. By adding support for depthwise convolution, this work expands the range of neural network architectures that can be trained directly on microcontrollers and contributes to making on-device AI more practical and flexible. ...

Evaluating Real-Time Performance of Embedded Millimeter-Wave Radar Pre-Processing Pipelines

Automatically tracking the positioning and alignment of human limbs, also known as Human Pose Estimation (HPE), was traditionally pioneered by camera-based systems like the Microsoft Kinect, and remains critical across domains from interactive gaming to healthcare patient monitoring. Millimeter-wave (mmWave) radar has emerged as a compelling alternative; by utilizing electromagnetic waves to detect points on the surface of objects, it offers a more cost-effective, privacy-preserving, and robust solution than traditional cameras. However, the spatial "point-clouds" generated by mmWave radars are particularly irregular, requiring pre-processing before they can be fed into deep learning models. While these pre-processing techniques are well-documented and can easily be implemented on in high-level environments like Python, adapting and optimizing these pipelines for low-power embedded devices remains an underexplored challenge. It is currently not clear whether point-cloud pre-processing can overcome the memory and computational restrictions of low-power devices.
This thesis profiles the memory footprint and latency of executing mmWave point-cloud pre-processing on micro-controllers, specifically an STM32 Cortex-M7 with 320 KB of SRAM with the goal of real-time performance by processing each data sample in under 100 ms.
We propose and evaluate seven pipeline variants, incorporating hardware-acceleration, lightweight alternative algorithms, pipeline restructuring to eliminate computational redundancies, and a single-pass iteration strategy to minimize cache misses. Experimental results demonstrate that structural optimization compresses peak memory consumption from 90 KB to 50 KB, successfully approaching the theoretical lower bound dictated by the output buffers. Our most highly optimized configuration achieves an exceptional average latency of 8.13 ms (with a worst-case peak of 12 ms), comfortably satisfying our real-time constraints.
Further analysis revealed that the average point count per frame is the primary driver of computational performance. Ultimately, this work validates that efficient, real-time end-to-end radar processing is entirely viable on highly resource-constrained micro-controllers. ...

The Effects of Component Size on Model Accuracy, Latency and Memory Usage

Human-pose estimation is a technology with many applications such as healthcare, smart homes, and new methods of human-computer interaction. However, traditional RGB camera-based systems come with significant privacy risks and can perform poorly in dark rooms. A new approach to human-pose estimation, estimating through the use of mmWave radars, could solve these problems. mmWave creates a point cloud of a person, rather than a direct RGB image, and is therefore not affected by dark conditions, while simultaneously letting the subject stay anonymous. Current mmWave models are very accurate, on the order of centimetres, but generally too costly to run without a GPU.

In this paper, we create an optimised mmWave human-pose estimation model that runs more accurately without a GPU compared to a baseline model. We do this by analysing a baseline model to find which parts can be compressed without excessively losing accuracy.

Our improved model has an inference time of 41 ms with a Mean Absolute Error (MAE) of 7.72 cm on an embedded device. Compared to the baseline, this model saves 85.9% latency, at the cost of 4.8% MAE accuracy.

Through finding which parts can be compressed most effectively, we also gain insight into the relative importance of each component of the model. We also identify components that, with further research, could be improved to increase the accuracy of the model. ...

An Empirical Evaluation and Hierarchical Sensing Pipeline

Bachelor thesis (2026) - Z. Corbanie, M.A. Zuñiga Zamalloa, H. Liu, J.M. Weber
Embedded sensing systems relying on energy harvesting — such as electromagnetic radiation, thermoelectric energy, and kinetic energy — generally are not able to harvest sufficient power to function under normal operations for most devices, and thus operate under severe power constraints. To ensure sustainable, battery free functionality, the microcontroller (MCU) must remain at a low power deep sleep state during idle periods. It is woken up by a sensor, sending an external hardware interrupt when an environmental event occurs. However, there is a trade off between a sensor’s power consumption, detection range, accuracy, and latency. This paper presents two primary contributions: 1) An empirical evaluation of various sensor wake up systems. 2) The design and implementation of a multi stage hierarchical event detection pipeline. This pipeline consists of an ultra low power coarse sensor that activates a high accuracy, but higher power sensor, minimizing the current draw while staying reliable. ...
Master thesis (2026) - R. van Dijk, M.A. Zuñiga Zamalloa, Q. Wang
The radio-frequency spectrum is increasingly congested and costly to license, which motivates the use of complementary wireless links in other parts of the electromagnetic spectrum.
Visible Light Communications (VLC) transmits data by modulating visible light.
Among the receiver types used in this field, event cameras are attracting increasing attention due to their significantly higher rates than conventional cameras.
Recent work has studied event-camera VLC in either Line-of-Sight (LoS) or Non-Line-of-Sight (NLoS) settings, but has not combined both links in a single transmitter.
In applications such as infrastructure-to-vehicle communication, receivers may operate under both LoS and NLoS conditions, making it desirable to support both link types simultaneously.

This thesis presents a single LED-matrix transmitter that supports both a high-data-rate (high-fidelity) LoS stream and a low-data-rate (low-fidelity) NLoS stream simultaneously.
To this end, we introduce Dual On-Off Keying (DOOK), a multi-fidelity modulation scheme that encodes high-fidelity data in the spatial and temporal dimensions, while encoding low-fidelity data in the temporal dimension only.
We further combine DOOK with state-of-the-art modulation schemes and design flicker-free variants.
We evaluate the resulting trade-offs between throughput, Bit Error Rate, and flicker.

Using DOOK, we achieve 366 kbps on the LoS link and 2,9 kbps on the NLoS link with a BER below 10-3.
DOOK with Manchester encoding halves the throughput and produces the least flicker among the evaluated schemes.
Compared with prior work, our NLoS throughput is 1,7× higher, while our LoS throughput is 1,8× higher per channel.
More importantly, our system combines both links in a single transmitter. ...
The widespread adoption of wireless communication devices has led to increasingly congested wireless networks, creating a need for alternative communication technologies. A promising alternative is Visible Light Communication (VLC), which, instead of using radio frequencies, leverages the visible light spectrum to enable data transmission. Unfortunately, using visible light to communicate comes with its own set of limitations, such as the relatively high power consumption of sustaining a light source. In pursuit of lower power consumption, passive VLC has recently been gaining attention. Passive VLC is a form of VLC that uses ambient light to transmit data. Typically, it does so by modulating sunlight or artificial lights using Liquid Crystals (LCs). However, LCs, amongst other things, have very limited modulation speeds. This limitation has prompted the search for transmitters capable of achieving higher modulation speeds. Recent works have investigated the use of Digital Micromirror Devices (DMDs), which can achieve significantly higher modulation speeds. Unfortunately, DMDs require precise alignment for both incoming and outgoing light, imposing strict alignment constraints. To avoid these constraints, we investigate alternative transmitter designs using mechanical actuators. This has led to the design of two novel passive VLC systems dubbed DiscoLink and SpeakerLink. The DiscoLink transmitter, which makes use of a stepper motor to oscillate a multitude of small mirrors, achieves a throughput speed of 66 bits per second. Meanwhile, SpeakerLink, which makes use of a voice coil to oscillate a DVD, achieves a throughput speed of 20 bits per second. Their unique designs are enabled by using an event camera as a receiver, leveraging its high spatial and temporal resolution. Both systems operate on the principle of oscillating a reflective surface to alternate between different origins of ambient light reflected toward the receiver. Therefore, unlike other passive VLC systems, these transmitters do not have traditional ”on” and ”off” states between which they can alternate. The designs also necessitate the development of a novel modulation scheme, for which we propose two distinct methods. Although both DiscoLink and SpeakerLink do not face the alignment constraints of DMDs, they do face other challenges, such as low throughput speeds, reduced range, and noise. Whilst the practical use of these systems is limited due to these challenges, they highlight a new potential design space for passive VLC. ...
Millimeter-wave (mmWave) radar is a promising active sensing technology for Human Pose Estimation (HPE). However, its reliability is hindered by poor generalization in scenarios unseen during the model's training phase. This thesis presents a comprehensive empirical study to analyze and quantify the mmWave dimensions causing the poor generalization for HPE models. To enable this research, we introduce mmDiverse, a new large-scale dataset containing varied human movements, users, environments, and distances from the radar. Using this dataset, we evaluated two foundational models, Baseline MARS and a temporally aware version, Temporal MARS, through a series of experiments designed to isolate each dimension. The results reveal that human diversity is the most critical challenge, with model accuracy dropping by over 100% when encountering an unseen individual. Unseen movements pose the next significant challenge, where the models revert to the learned movements exposed during model training rather than generalizing to new kinematic movement patterns. Additionally, our study shows that the model's capacity to learn new kinematic patterns is enhanced by integrating model-centric techniques such as temporal modeling. This study also reveals that training in cluttered, noisy environments, combined with a target classification and tracking data pre-processing pipeline, is crucial for improving model robustness. Based on these findings, this thesis provides a set of evidence-based guidelines for developing more resilient mmWave-based HPE systems. This study concludes that building reliable applications requires prioritizing the collection of data with extensive user and movement diversity, captured across noisy and cluttered real-world environments. Furthermore, this diverse dataset should be used to train a temporal-aware model architecture incorporating a data pre-processing pipeline to mitigate generalization challenges. ...
Master thesis (2025) - W. Liang, M.A. Zuñiga Zamalloa, H. Liu, Q. Wang
Microcontroller-based neural network inference faces significant RAM constraints, hindering performance and deployment. One of the main constraints is the peak memory usage, which is essential for conducting deep learning inferences with low latency. To address this, previous researchers have developed peak memory estimators to estimate the Peak Memory Usage, which could be used by inference-time optimization techniques like pruning to tackle the RAM constraint. But many of the peak memory estimators used by current state-of-the art frameworks like µNAS and TinyEngine produce underestimation or overestimation, reducing the reliability of model decisions made under RAM constraints. Underestimation often arises from failing to account for all components contributing to peak memory usage, while overestimation can occur when extra memory overheads irrelevant in MCU-specific inference scenarios are incorrectly included. In this paper, we propose our peak memory estimator, which estimates the peak memory usage of deep learning inference at the operator level and can accurately estimate the peak memory usage of a deep learning model during inference. The experiments show that our method achieves more accurate peak memory predictions across multiple MCU platforms and can be effectively integrated with pruning strategies to produce better model compression that satisfies both memory and accuracy constraints. Specifically, on a benchmark of 150 models, our method achieved an average estimation error margin of only 0.9%, significantly outperforming µNAS, which exhibited an average error margin of 61.7%. ...
Master thesis (2025) - S. Suresh, M.A. Zuñiga Zamalloa, H. Liu, J. Yang
There has been a steady increase in technologies that leverage Deep Learning (DL) techniques on resource-constrained devices for real-time processing. While DL techniques are adept at recognition tasks, their performance depends on the training process. Training data is seldom fully representative of the deployment scenario, requiring retraining to preserve accuracy. Many works have proposed on-device learning techniques that enable training DL techniques like Convolutional Neural Networks (CNNs) on microcontroller units (MCUs), without requiring data to be sent to the cloud. These methods have yielded promising accuracy improvements while training with low memory footprints. However, limited research has been done on the energy implications of doing so.

This thesis presents methods for energy-aware on-device learning on MCUs. It leverages the principle of updating specific layers of a CNN, proposed in past works, to fit memory constraints. We propose an energy-accuracy trade-off objective based on computational costs (in MACs) and accuracy improvement to select which layers to train. Furthermore, we demonstrate how computationally light search algorithms can adequately maximize the newly defined objective for layer selection. Evaluations show that our approach saves up to 200mJ of energy on-device while yielding simulation accuracies similar to a recent study under the same conditions. ...
The broad adoption of integrated computing systems through the Internet of Things has led to a need for seamless, low-latency interfaces. Deploying these installations in public spaces, such as gaming areas in airports, presents unique challenges. Traditional input modalities are ill-suited for these environments. Cameras compromise user privacy, and wearables are susceptible to theft, damage or loss. Millimetre wave (mmWave) radars are an ideal sensor for such an interface, as they can operate through non-conductive materials and independently of lighting conditions. Most human sensing solutions for mmWave radars use Deep Learning models, which rely on large data sets for training and have a high computational overhead, limiting their feasibility on resource-constrained edge devices. To address these limitations, we present IAmMuse, a lightweight, signal-processing-based interpretation pipeline for human sensing using mmWave radar technology. We filter and enhance the raw point cloud data, using spatio-temporal density information, as well as kinematic context acquired through a few-shot online learning step. IAmMuse uses this pre-processed data to generate a stabilised prediction, classifying the user’s arm position. We implemented a musical system, controlled through these predictions, as an example application for the technology. The user selects musical notes by moving their arms to either a low, middle, or high position, similar to a conductor. To assess the efficacy of this method, we present a comparative analysis with a State-of-the-Art Human Pose Estimation model. This comparison shows that IAmMuse achieves a classification accuracy 50% higher than the State-of-the-Art model, while using less than 1% of the training data. This thesis validates the viability of non-deep-learning-based interpretation algorithms for human sensing with mmWave radars through a fully functional prototype. ...
Doctoral thesis (2025) - Hanting Ye, M.A. Zuñiga Zamalloa, Q. Wang
The advancement in transparent screen technology has promoted adoption of full-screen design on mobile devices, reducing the area occupied by optical sensors to maximize the devices' screen-to-body ratio. In modern smartphones, front-facing optical sensors, such as ambient light sensor and camera, now must be placed under the transparent screen to capture ambient light and visual information. Motivated by this trend, we propose Through-Screen Computing in this dissertation. It is a new concept that refers to the processing of light signals for various computing purposes such as communication, sensing, and imaging, where the light comes from the physical world and passes through a special medium -- the transparent screen -- before reaching the under-screen optical sensors. This concept opens up new challenges and opportunities in connectivity, privacy, and security of future devices equipped with transparent screens. In this dissertation, we outline a vision for through-screen computing and address the challenges of transparent screens acting as both passive blockers and active interferers of input light signals.

This dissertation focuses on two subsystems in the context of through-screen computing: Through-Screen Visible Light Communication (VLC) and Screen Perturbation for Visual Privacy Protection. In the context of VLC, the full-screen trend challenges the deployment of this technology. We propose Through-Screen VLC with under-screen optical sensors as receivers. To address the attenuation of the light by the transparent screen, we develop SpiderWeb, a system exploiting the color domain to mitigate the color-pulling effect introduced by the transparent screen. We also leverage the Under-Screen Camera (USC) for VLC and design novel demodulation algorithms to reduce multi-pixel screen interference and improve performance. Experimental results show significant improvements in both data rate and transmission range, using  a prototype we build with two commercial smartphones. For privacy protection, we propose Screen Perturbations, modifying pixels displayed on the transparent screen to embed speckled color blocks and color shifts in the final image captured by the USC. Screen perturbations can be exploited to disrupt advanced deep neural networks used on image classification and face recognition tasks. We first design two image-specific methods to add screen perturbations to the images captured by USC. Next, we develop Unicorn, a universal screen perturbation method optimized for robustness in various conditions. All these designed perturbations work successfully against various deep neural network-based image classification services with high success rates.

Through these two subsystems, as well as the proposed theoretical and experimental approaches and results, we demonstrate the transformative potentials of through-screen computing, setting the stage for future research and development on various computing purposes in the era of transparent screen and full-screen devices.
...

Utilizing the sun to establish wireless connections

Nowadays,wireless connectivity is ubiquitous: humans use smartphones, smartwatches, laptops and other devices, while at the same time, the Internet of Things (IoT) is adding millions of connected objects. This large number of devices uses mainly the radio frequency (RF) spectrumfor communication. And a direct consequence of this exponential growth is the scarcity of free RF bands to cope with this demand.

To tackle this challenge, researchers have proposed using a different carrier: visible light. With Visible Light Communications (VLC), devices communicate with each other by modulating the intensity of their light-emitting diodes (LEDs) and demodulating it using light sensors. The key advantage of VLC is the utilization of the visible light spectrum, with free bands that do not interfere with traditional RF systems. Nonetheless, despite the efficiency of LED technology, luminaries still require several Watts to generate light. The need for this considerable amount of energy has triggered interest in a new research area: Passive VLC. The fundamental principle of Passive VLC is to exploit ambient light to create wireless links, thus reducing the energy required by transmitters to generate their own light.

Passive VLC is a promising area, but poses a daring challenge: modulate light without any control over the source. The research community has proposed using optical surfaces that block or reflect light dynamically as modulators, but these platforms provide limited data rates, ranging froma few tens of bps to a few kbps. Moreover, using the sun as the source of ambient light introduces another challenge: variations in position and intensity.

This dissertation aims to improve the performance of Passive VLC systems operating with sunlight, with a particular focus on increasing the data rate and resilience to the changing sun’s position.

Our first contribution is a short-range wireless link using a tiny screen as a transmitter and a camera as a receiver. The screen is a reflective surface, adapted to work with ambient light. The sunlight reaching the screen is modulated to transmit information to a smartphone’s camera, creating a stream of optical data. This screen-to-camera link using sunlight attains up to 10 kbps, ten times faster than previous similar systems, working from sunrise to sunset - independent of the sun’s position.

Inspired by the concept of Li-Fi, which combines illumination and VLC, our second contribution envisions the creation of a natural light bulb with wireless communication capabilities. Our design combines optical modulators, optical filters and sunlight collectors to track the sun’s position during the day and radiate modulated beams of sunlight in indoor scenarios. These beams of natural light provide illumination and communication and are the first to divide sunlight into two color channels to double the data rate.

Our third contribution proposes a novel link for robots to communicate using sunlight. We leverage a material used in solar technology, the Luminescent Solar Concentrator (LSC). An LSC surface absorbs light fromits top and emits it on its edges. We place LSCs on top of robots, together with liquid crystal cells (LCs), so sunlight arriving from the top can be modulated into data packets transmitted toward the edges. This novel communication systemallows task coordination between robots using sunlight.

Overall, this dissertation presents new Passive VLC systems focusing on applications that exploit the sun as the light source. Within this scenario, our focus has been to increase the data rate, with the first two contributions, and on making the systems resilient to the sun’s position, with all three contributions. ...

A smart vest to provide visible light communication inside pockets

Master thesis (2024) - J.A. Wesdorp, M.A. Zuñiga Zamalloa
Visible light communication (VLC) has gained attention recently as radio frequencies become increasingly congested. VLC offers a promising alternative for wireless communication with several advantages: It provides 10 times more bandwidth than traditional radio frequencies, is more energy-efficient and secure, and can take advantage of the existing lighting infrastructure.
However, VLC also has drawbacks, such as its susceptibility to ambient light interference and its dependence on a clear line of sight (LOS). When the receiver is obstructed, such as being placed in a pocket, the signal is blocked, and communication fails.

We address one of the most important NLOS scenarios in VLC: when users place the receiver inside the pocket. Our system places photodiodes on a 3D-printed vest to capture the optical data and then forwards the information to the phone inside the pocket using near-field communication (NFC).

We introduce several optimizations to enhance the performance of LightVest. First, we develop a novel method for optimizing photodiode placement on the vest using the Lambertian propagation model, ensuring optimal angles for maximum signal reception. Additionally, we implement adaptive filtering and threshold techniques to maintain reliable communication in dynamic environments, improving the VLC system's robustness against noise and movement. We also optimize the software to increase the sampling rate, reducing processing times. These improvements result in a maximum data rate of 25 kbps and a range of 220 cm at a data rate of 5 kbps with a bit error rate of 0.025.

We enhanced the NFC link using techniques like Fast Transfer Mode and non-blocking I2C to achieve a maximum data rate of 21 kbps. To facilitate user interaction with the LightVest, we developed an Android application to control the microcontroller. In addition, it provides data visualization and collection, significantly speeding up the debugging and experimentation processes.

Overall, LightVest represents an advancement in extreme NLOS and wearable VLC, paving the way for future innovations in secure and wearable VLC solutions.
Future work could focus on improving the performance of the VLC link by selecting a more powerful microcontroller, using enhanced filtering, and adopting a more advanced modulation scheme. Future efforts could also include adding an uplink to the system to complete the VLC setup and exploring alternative vest designs by using a vest or shirt instead of a 3D model. ...
Master thesis (2024) - Z. Lou, M.A. Zuñiga Zamalloa, Q. Wang, Jorge Abraham Martinez Castaneda, Talia Xu
Indoor localization technology has become increasingly crucial as the demand for precise and reliable positioning systems grows across various applications. Traditional methods, such as vision-based techniques, radio signal-based technologies (including UWB, WiFi, RFID, and Bluetooth), and visible light-based technologies, offer unique advantages and limitations. Among these, visible light positioning (VLP) stands out for its potential to provide high accuracy by leveraging the characteristics of light signals.

This thesis explores the integration of VLP with a balloon-enabled drone—a novel UAV setup featuring a buoyant balloon that extends flight duration. A balloon-enabled drone introduces both opportunities and challenges for VLP methods due to its size. Its large surface area can block light paths, which may impact signal reception and positioning accuracy. On the other hand, it also allows for the use of multiple receivers across the surface, potentially improving positioning reliability.

Traditional VLP systems typically utilize multiple transmitters and a single receiver; however, our approach takes advantage of the large surface area of a balloon-enabled drone by using only a single transmitter with multiple receivers strategically positioned on the balloon. This setup leverages the balloon’s curved surface to capture a diverse range of light intensities and angles, thereby improving positioning accuracy. We developed a 2D+H RSS-based VLP model specifically designed for balloon-enabled drones. This model takes into account factors like light transmission and optical channel loss. Our VLP system includes multiple receivers placed on the balloon’s surface and a single transmitter. We analyzed the optimal number and placement of these receivers to enhance positioning accuracy.

The system’s performance was tested through both static and dynamic experiments. In static tests, our configuration achieved an average positioning error of 4 cm. During dynamic tests, which involved movement and tilt, the mean error increased to 10-12 cm, largely due to difficulties in estimating height and managing tilt angles. Overall, our system shows an improvement over existing positioning methods like Crazyflie, while also maintaining low energy consumption and computational complexity. This work highlights the potential of our VLP model to improve the positioning accuracy of balloon-enabled drones for various applications. ...
Modern building facades and indoor partition walls feature large amounts of transparency for sufficient lighting and social safety. However, this transparency leads to concerns about privacy invasion, as sensitive objects, such as computer monitors, are exposed to onlookers. The advent of advanced screen technology has introduced VideowindoW, a smart installation capable of adjusting the transparency of its pixels to create a self-fading window, potentially addressing these privacy concerns. This thesis investigates the combined use of these smart windows together with mmWave radar technology, as a non-intrusive method to perform human posture estimation among multiple people. It develops a real-time system that detects passersby, localizes their head/eyes and obstructs their line-of-sight to sensitive indoor content, by projecting opaque squares on the smart screen. A notable gap in the mmWave literature is the insufficient handling of posture estimation challenges posed by multiple and dynamically moving targets. To address this gap, we propose the first, to our knowledge, mmWave-based Multi-Person Pose Estimation (MPPE) system. This system combines and improves two state-of-the-art methods for tracking and posture estimation and introduces a novel dataset for dynamic targets, including ground truth data for 19 human joints. Our solution demonstrated a 20% improvement in joint localization Mean Average Error (MAE) over the baseline system, in offline experiments with a single dynamic target. Furthermore, it achieved a mean blocking accuracy of 92% in online evaluations involving multiple people and varying environment. These results highlight a promising application in privacy shielding and lay the groundwork for further research in mmWave posture estimation in more unconstrained scenarios. ...
Master thesis (2024) - F.E. Joosen, M.A. Zuñiga Zamalloa, M. Xu, Q. Wang
Visible Light Communication (VLC) leverages the visible light spectrum to establish wireless communication, offering advantages such as broader bandwidth, and reduced energy consumption compared to traditional radio frequency methods. VLC offers two main approaches: passive and active. Passive VLC takes advantage of sunlight, which is pervasive and highly power-efficient. However, its reliability can be affected by weather conditions and the absence of sunlight at night. On the other hand, active VLC which uses artificial light sources like LEDs, provides more consistent performance but is not power-efficient when sunlight is available. For example, during the day when ample sunlight could be used for passive VLC, turning on a light bulb for active VLC is unnecessary and wasteful.

This thesis tackles these challenges by combining the best of active and passive systems to create an even more power-efficient and reliable system. It addresses two key problems in passive VLC: reducing the power consumption of passive VLC transmitters and enhancing the reliability of passive VLC links through a hybrid system. By replacing the FPGA-based controller with a low-power microcontroller, the power consumption of the Digital Micro-mirror Device used for sunlight modulation was significantly reduced from 1.3W to 36.85mW, while achieving a data rate of 25 kbps with a bit error rate (BER) of less than 1% at a distance of 25 cm. Its maximum range was determined to be 75 cm at 10 kbps. Additionally, integrating an LED component into the passive VLC communication link improved reliability in varying ambient light conditions. The hybrid system demonstrated enhanced performance in low ambient light scenarios, ensuring a BER below 1% regardless of ambient light conditions. In high ambient light scenarios, the LED can be dimmed or turned off, conserving power and making the system more efficient than a purely active VLC system. This thesis contributes to the advancement of energy-efficient and reliable VLC technologies, paving the way for their broader adoption. ...

Monitoring people without cameras: Privacy is important!

Human Pose Estimation using Millimeter Wave radars has emerged as a promising alternative to traditional camera-based systems, addressing privacy and deployment constraints. While state-of-the-art Deep Learning models predominantly focus on spatial feature extraction to determine the positions of key points in the human body, this research investigates the effects of incorporating temporal dynamics in such models. It focuses of modifying an existing state-of-the-art spatial model to account for temporal dynamics and compares the performance of the two models. Long Short-Term Memory networks are used to capture temporal dependencies between frames of point clouds which significantly boosts the precision of key point detection. The proposed temporal model demonstrates a 53% reduction in Mean Absolute Error and a 45% reduction in Root Mean Squared Error compared to state-of-the-art model. Moreover, these improvements were achieved with a less complex model architecture and similar training times. The robustness of the model was further validated on a different dataset, showcasing its potential for broad application in fields such as healthcare, sports analysis, traffic monitoring and robotics. This study underscores the efficacy of temporal dynamics in pose estimation, and showcases the advantages of accounting for temporal dependencies when evaluating more complex movements. ...

An Extension on PointNet and MARS through LSTM Integration

The People Counting Problem requires calculating the number of people in a region of interest. This is needed in crowd-monitoring scenarios but has become increasingly problematic when relying on video cameras, as they raise privacy concerns. Instead, we propose using a mmWave radar to detect people by creating point clouds from their radar signal reflections. This approach, however, can pose challenges when people walk closely together because their individual point clouds overlap and are seen as a single, larger cloud. It is difficult to count how many individuals this large point cloud holds, which can lead to miscounting the people in the scene. One approach to address this issue is leveraging the time dimension in people walking sequences, which can be done with Long Short-Term Memory (LSTM) models. Given this, we investigate how two state-of-the-art models, PointNet and MARS, perform for people counting from point clouds when extended through LSTMs. The results show how both PointNet and MARS improve performance when extended by LSTMs. Particularly, despite having over double the parameters, MARS+LSTM outperforms PointNet+LSTM in terms of accuracy and computational efficiency. MARS+LSTM can effectively capture small changes in the local structure of point clouds between frames, which PointNet loses due to max pooling. This highlights the importance of selecting a model architecture, like the CNN in MARS, that aligns with the data characteristics to maximise performance. ...

Reducing Stationary Target Noise in Tracking and Movement Reconstruction

Interactive video games often use vision-based systems or wearables to track player movements. Vision-based systems are privacy-invasive, and wearables require frequent recalibration and recharging. Frequently-Modulated Continuous-Wave (FMCW) radars have been proposed as an alternative tracking solution addressing these problems. Working in the millimeter-wave (mmWave) range, they capture scenes as point clusters, ensuring privacy without attaching sensors to the user. Previous research has shown their applicability in rehabilitation, gait recognition, and smart home appliances. This study focuses on integrating an mmWave sensing device with an interactive version of the Breakout video game. We propose a general framework with three main modules - generating points, clustering them, and reconstructing the player's movements as in-game commands. Our system enhances an already existing Kalman filter-based tracking algorithm. Online experiments were conducted to compare the proposed system to the baseline algorithm. The proposed system decreases the standard deviation on the estimated target location by 33% against motionless targets, while maintaining the baseline accuracy when tested on moving targets. Furthermore, it allows a higher game refresh rate, thus smoothing in-game movements. These results demonstrate the potential of FMCW radars in enhancing interactive video game experiences. ...