Leveraging Large Foundation Models for Zero-Shot IoT Sensing

Master Thesis (2024)
Author(s)

D. XUE (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Koen Langendoen – Graduation committee member (TU Delft - Embedded Systems)

Q. Song – Mentor (TU Delft - Embedded Systems)

Z. Yue – Graduation committee member (TU Delft - Multimedia Computing)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
28-06-2024
Awarding Institution
Delft University of Technology
Programme
['Electrical Engineering | Embedded Systems']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Deep learning models are now widely deployed on edge IoT devices. However, most of these models are trained under supervised conditions and can only recognize seen classes learned from the training stage. Zero-shot learning (ZSL) is a popular method for identifying unseen classes by leveraging the semantic information from both seen and unseen classes. Foundation models (FMs) trained on web-scale data have shown impressive ZSL capability in natural language processing and visual understanding. However, leveraging FMs' generalized knowledge for zero-shot Internet of Things (IoT) sensing using signals such as mmWave, IMU, and Wi-Fi has not been fully investigated. In this work, we align the IoT data embeddings with the semantic embeddings generated by an FM's text encoder for zero-shot IoT sensing. To utilize the physics principles governing the generation of IoT sensor signals to derive more effective prompts for semantic embedding extraction, we propose to use a multi-source information fusion strategy, cross-attention, to combine a hard prompt generated by Large Language Models (LLMs) and a soft prompt consisting of learnable vectors. To address the problem of IoT embeddings biasing to seen classes due to the lack of unseen class data during training, we propose using data augmentation to synthesize unseen class IoT data for fine-tuning the IoT feature extractor and embedding projector. We evaluate our approach on multiple IoT sensing tasks. Experiment results show that our approach achieves an average improvement of 1.0% in open-set detection and 9.5% in generalized zero-shot learning compared with multiple baselines on three datasets.

Files

DinghaoXue_Master_Thesis.pdf
(pdf | 7.39 Mb)
- Embargo expired in 28-12-2024
License info not available