Q. Song | TU Delft Repository

Privacy Protection via Imperceptible Face Masking: A Dynamic Approach based on HyperNet

Master thesis (2024) - Q. Yang, Q. Song, K.G. Langendoen, M.A. Zuñiga Zamalloa

The proliferation of video recording devices and facial recognition technology has led to significant privacy concerns, as surveillance systems can capture and identify individuals without their consent. Traditional facial obfuscation systems, which introduce pixel-level perturbations to images, aim to protect privacy by preventing unauthorized facial recognition. However, these systems are vulnerable to inversion attacks, where attackers can reverse the perturbations to restore original images, compromising privacy. This thesis addresses these vulnerabilities by proposing HyperObf, a novel approach utilizing HyperNet technology to generate unique obfuscation networks for each user. HyperObf ensures that each user’s images are distinctly protected, making it challenging for attackers to reverse-engineer the obfuscations. Our experiments demonstrate that inversion attacks can significantly degrade the protection offered by static obfuscation systems, with restored images achieving face recognition accuracy close to that of original images.
In contrast, HyperObf effectively mitigates these attacks, reducing the attack success rate to 30% compared to 60% for existing methods. Additionally, HyperObf can generate 100 personalized MaskNets in 0.2 seconds using high-performance computing resources. These findings highlight the potential of HyperObf to enhance privacy protection against unauthorized facial recognition and inversion attacks in the digital age. ...

Towards Efficient Deep Learning Based Siren Detection

Master thesis (2024) - L.S.S. Bollapragada, K.G. Langendoen, Q. Song, Andreas Lenz, Bruno Defraene

This thesis presents the development and evaluation of a real-time neural network-based audio classification system designed on an NXP HW board to distinguish emergency response vehicles by their sirens from other vehicles. At the core of the system is a deep learning model that processes audio inputs captured via a microphone, classifying them based on the presence of siren sounds. The system achieves this by extracting audio features and running inference through the designed neural network, followed by post-processing to detect sirens accurately. Audio signals are transformed into mel-spectrograms, which represent the frequency spectrum over time using a specific window size for analysis. The neural network leverages these mel-spectrogram features to perform audio classification.
The deployment of this system involves several key steps. First, the model is trained on diverse audio data, including siren and non-siren sounds. Audio signals are transformed into mel-spectrograms, which capture the frequency spectrum over time. The neural network processes these features to classify the audio based on the presence of siren sounds. The classified results undergo post-processing to enhance detection accuracy. The system is tested in real-world scenarios, demonstrating a turnaround time of less than 3 seconds even under high noise conditions. Various trade-offs are evaluated to improve efficiency, reduce memory size, and minimize latency, ensuring the system meets requirements for model size, latency, and compute cycles. The custom dataset comprises 280 hours of audio, including well-known, publicly available datasets such as ESC-50, Audioset, and UrbanSound. This dataset is enriched with both original and augmented siren sounds and non-siren audio to enhance the model’s learning efficacy and robustness. The system achieves a 96.19% test accuracy in identifying sirens and is suitable for deployment in real-world scenarios, even for SNRs as high as -30 dB. Although the system meets most requirements for model size, latency, and compute cycles, the false positive rate needs improvement. This can be achieved by expanding the dataset and retraining the model.

...

This thesis presents the development and evaluation of a real-time neural network-based audio classification system designed on an NXP HW board to distinguish emergency response vehicles by their sirens from other vehicles. At the core of the system is a deep learning model that processes audio inputs captured via a microphone, classifying them based on the presence of siren sounds. The system achieves this by extracting audio features and running inference through the designed neural network, followed by post-processing to detect sirens accurately. Audio signals are transformed into mel-spectrograms, which represent the frequency spectrum over time using a specific window size for analysis. The neural network leverages these mel-spectrogram features to perform audio classification.
The deployment of this system involves several key steps. First, the model is trained on diverse audio data, including siren and non-siren sounds. Audio signals are transformed into mel-spectrograms, which capture the frequency spectrum over time. The neural network processes these features to classify the audio based on the presence of siren sounds. The classified results undergo post-processing to enhance detection accuracy. The system is tested in real-world scenarios, demonstrating a turnaround time of less than 3 seconds even under high noise conditions. Various trade-offs are evaluated to improve efficiency, reduce memory size, and minimize latency, ensuring the system meets requirements for model size, latency, and compute cycles. The custom dataset comprises 280 hours of audio, including well-known, publicly available datasets such as ESC-50, Audioset, and UrbanSound. This dataset is enriched with both original and augmented siren sounds and non-siren audio to enhance the model’s learning efficacy and robustness. The system achieves a 96.19% test accuracy in identifying sirens and is suitable for deployment in real-world scenarios, even for SNRs as high as -30 dB. Although the system meets most requirements for model size, latency, and compute cycles, the false positive rate needs improvement. This can be achieved by expanding the dataset and retraining the model.

Leveraging Large Foundation Models for Zero-Shot IoT Sensing

Master thesis (2024) - D. XUE, K.G. Langendoen, Q. Song, Z. Yue

Deep learning models are now widely deployed on edge IoT devices. However, most of these models are trained under supervised conditions and can only recognize seen classes learned from the training stage. Zero-shot learning (ZSL) is a popular method for identifying unseen classes by leveraging the semantic information from both seen and unseen classes. Foundation models (FMs) trained on web-scale data have shown impressive ZSL capability in natural language processing and visual understanding. However, leveraging FMs' generalized knowledge for zero-shot Internet of Things (IoT) sensing using signals such as mmWave, IMU, and Wi-Fi has not been fully investigated. In this work, we align the IoT data embeddings with the semantic embeddings generated by an FM's text encoder for zero-shot IoT sensing. To utilize the physics principles governing the generation of IoT sensor signals to derive more effective prompts for semantic embedding extraction, we propose to use a multi-source information fusion strategy, cross-attention, to combine a hard prompt generated by Large Language Models (LLMs) and a soft prompt consisting of learnable vectors. To address the problem of IoT embeddings biasing to seen classes due to the lack of unseen class data during training, we propose using data augmentation to synthesize unseen class IoT data for fine-tuning the IoT feature extractor and embedding projector. We evaluate our approach on multiple IoT sensing tasks. Experiment results show that our approach achieves an average improvement of 1.0% in open-set detection and 9.5% in generalized zero-shot learning compared with multiple baselines on three datasets. ...

Comparative Study of Passive and Active Acoustic Sensing for Indoor Room Recognition

Bachelor thesis (2023) - J.M. Chan, Q. Song, J.A. Martinez Castaneda

The ability to accurately determine the location within indoor settings is crucial for various applications such as indoor navigation, interactive floor plans, and room-specific services. While GPS technology has revolutionized outdoor positioning, it falls short in providing precise location information within buildings due to signal blockage. To address this limitation, specialized indoor positioning systems utilizing acoustic sensing have been explored, leveraging deep learning models. This paper presents a comparative study of passive and active acoustic sensing systems for room recognition. Passive sensing involves capturing existing background noise in a room and using it as an identifier, while active sensing emits acoustic signals and analyzes the resulting echoes. Previous research has primarily focused on active sensing, achieving high classification accuracy but facing challenges related to device orientation and the presence of multiple individuals. Moreover, the emission of high-frequency chirps used in active sensing may cause discomfort to pets. The results indicate that passive sensing achieves an accuracy of 73.7%, slightly outperforming active sensing at 63.5% in baseline conditions. However, in the presence of constant background noise, passive sensing accuracy drops to 21.7%, while active sensing exhibits better resilience with an accuracy of 59.7%. Furthermore, when the device orientation is altered by 90 degrees, active sensing results in a lower in accuracy (45.5%), while passive sensing maintains better performance at 71%. The impact of multiple individuals in the room had a relatively minor effect on passive sensing systems, achieving an accuracy of 72.2%. Active sensing was shown to be not as resilient, reaching an accuracy of 44.2%.
...

Indoor Location Sensing Using Smartphone Acoustic System

What kind of deep models could be used for indoor location recognition? How to deploy and evaluate the model on smartphones and make the inference run in real time?

Bachelor thesis (2023) - R.N. Sozonov, Q. Song, J.A. Martinez Castaneda

Indoor localization is a field in a development process. Different solutions have been introduced in recent years. Some of the solutions use beacons, WI-FI access points, different smartphone sensors, or acoustic sensing to make localizations. This paper is presenting an application that uses acoustic sensing data to perform localization with different deep models. The research aims to explore different models and evaluate their performance metrics in the classification of three different acoustic data sets and their overhead on the system. Two different architecture designs are implemented - a client-server one as the models are stored on the server and one only front-end oriented as in this case compressed models are used. The results show that the client-server approach outperforms the front-end only design as the former's models reach classification results of 98\%, 90\% and 90\% tested on three different data sets, despite taking longer to fetch a prediction result from the server compared to the compressed models stored on a smartphone device. ...

Indoor Location Sensing Using Smartphone Acoustic System

Impact of interferences on the performance of acoustic indoor location system

Bachelor thesis (2023) - F.K. Biliński, Q. Song, J.A. Martinez Castaneda

Indoor localization is an actively researched field due to there not being a universal solution found yet. Applications of such systems include but are not limited to indoor wayfinding and automated tour guides. In previous years multiple solutions were proposed. This work looks into the performance of an indoor location sensing system in the presence of background music and tries to improve the accuracy in such a scenario. To achieve that a denoising autoencoder is proposed as a preprocessing step aiming to remove the noise from the fingerprints used for localization. In the end, it is shown that the use of such a technique introduces a tradeoff between an accuracy drop in quiet environments but an accuracy increase in environments with music. ...

Indoor location sensing using smartphone acoustic system

Combining acoustic and WiFi localization

Bachelor thesis (2023) - D. Kažemaks, Q. Song, J.A. Martinez Castaneda

Indoor localization is an important field of research for advancing robotics and providing more accurate estimations of indoor locations for users. There are many indoor localization algorithm implementations, but many of them underperform under certain environmental changes or restrictions. This research will present a way of combining already existing indoor localization techniques to more accurately deduce the user’s location within a building. An experiment was conducted within campus building Pulse, where multiple fingerprints of locations where gathered, and then used to train and test the combined classification models. By fusing active acoustic location sensing and WiFi localization using weighted averaging, ensemble stacking, and 2-step localization, the combination of classifiers was able to outperform individual classifiers by up to 5% of localization accuracy. Additionally, 2-step localization and weighted averaging methods did not add any performance overhead. ...

Optimal data capturing for indoor location sensing

Bachelor thesis (2023) - R.O. van Heerde, Q. Song, J.A. Martinez Castaneda

In today’s world, accurate location sensing is impossible to think away. One of the most prominent and most used techniques for determining location is GPS. In the outside world, GPS is capable of pinpointing a location with only a few meters error. But inside buildings, GPS often fails to deliver the same accuracy. In this paper, a relatively new technique will be presented to solve this problem using acoustic location sensing where a smartphone emits inaudible chirps and records the result. Specifically, this paper will cover what kind of data is needed to train the deep model that will solve this problem. ...