Weakly-supervised Learning for Fine-grained Emotion Recognition using Physiological Signals

Journal Article (2023)
Author(s)

Tianyi Zhang (Centrum Wiskunde & Informatica (CWI), TU Delft - Multimedia Computing)

Abdallah El Ali (Centrum Wiskunde & Informatica (CWI))

Chen Wang (Xinhua News Agency, Beijing)

Alan Hanjalic (TU Delft - Intelligent Systems)

Pablo Cesar (Centrum Wiskunde & Informatica (CWI), TU Delft - Multimedia Computing)

Department
Intelligent Systems
Copyright
© 2023 T. Zhang, Abdallah El Ali, Chen Wang, A. Hanjalic, Pablo Cesar
DOI related publication
https://doi.org/10.1109/TAFFC.2022.3158234
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 T. Zhang, Abdallah El Ali, Chen Wang, A. Hanjalic, Pablo Cesar
Department
Intelligent Systems
Bibliographical Note
Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. @en
Issue number
3
Volume number
14
Pages (from-to)
2304-2322
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Instead of predicting just one emotion for one activity (e.g., video watching), fine-grained emotion recognition enables more temporally precise recognition. Previous works on fine-grained emotion recognition require segment-by-segment, fine-grained emotion labels to train the recognition algorithm. However, experiments to collect these labels are costly and time-consuming compared with only collecting one emotion label after the user watched that stimulus (i.e., the post-stimuli emotion labels). To recognize emotions at a finer granularity level when trained with only post-stimuli labels, we propose an emotion recognition algorithm based on Deep Multiple Instance Learning (EDMIL) using physiological signals. EDMIL recognizes fine-grained valence and arousal (V-A) labels by identifying which instances represent the post-stimuli V-A annotated by users after watching the videos. Instead of fully-supervised training, the instances are weakly-supervised by the post-stimuli labels in the training stage. The V-A of instances are estimated by the instance gains, which indicate the probability of instances to predict the post-stimuli labels. We tested EDMIL on three different datasets, CASE, MERCA and CEAP-360VR, collected in three different environments: desktop, mobile and HMD-based Virtual Reality, respectively. Recognition results validated with the fine-grained V-A self-reports show that for subject-independent 3-class classification (high/neutral/low), EDMIL obtains promising recognition accuracies: 75.63% and 79.73% for V-A on CASE, 70.51% and 67.62% for V-A on MERCA and 65.04% and 67.05% for V-A on CEAP-360VR. Our ablation study shows that all components of EDMIL contribute to both the classification and regression tasks. Our experiments also show that (1) compared with fully-supervised learning, weakly-supervised learning can reduce the problem of overfitting caused by the temporal mismatch between fine-grained annotations and physiological signals, (2) instance segment lengths between 1-2 s result in the highest recognition accuracies and (3) EDMIL performs best if post-stimuli annotations consist of less than 30% or more than 60% of the entire video watching.

Files

Weakly_Supervised_Learning_for... (pdf)
(pdf | 1.56 Mb)
- Embargo expired in 05-10-2023
License info not available