Multi-representation Emotion Recognition in Immersive Environments

Master Thesis (2024)
Author(s)

Tony Yang (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Guohao Guohao – Mentor (TU Delft - Embedded Systems)

X. Zhang – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

KG Langendoen – Graduation committee member (TU Delft - Embedded Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
22-10-2024
Awarding Institution
Delft University of Technology
Programme
Electrical Engineering | Embedded Systems
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This study addresses the gap for fine-grained emotion recognition in immersive environments utilizing solely data from on-board sensors. Two data representations of users eyes are utilized, including periocular recordings and eye movements (gaze estimation and pupil measurements). A novel multi-representation method integrating feature extractors for each representation alongside an effective feature fusion technique is proposed. The method significantly outperforms baselines that use only a single representation or incorporate content stimuli. It achieves an F1-score of 0.85 with 10\% data, approximately 40 seconds of data from all emotions, for personal adaptation, recognizing emotions while watching unseen parts of the stimuli used for adaptation. In a more practical scenario, the method achieves an F1-score of 0.71 with five seconds of personal adaptation data from each emotion, recognizing emotions while watching completely unseen stimuli. Under the same but more extreme condition, where only one second of data is available, the proposed achieves an F1-score of 0.68.
Furthermore, the study demonstrates that estimated labels can substitute for user-provided labels without sacrificing performance in emotion recognition, thus eliminating the need for users to manually label emotion elicitation segments.
Future work will focus on improving performance by allocating more computational resources and making architectural modifications, conducting deeper investigations into the decision-making process, and developing real-time recognition systems for in-the-wild experiments.
The results of this study suggest that more engaging, adaptive, and personalized experiences in immersive environments can be developed.

Files

License info not available
warning

File under embargo until 22-10-2025