Weakly-supervised Learning for Fine-grained Emotion Recognition using Physiological Signals

Zhang, T.; El Ali, Abdallah; Wang, Chen; Hanjalic, A.; Cesar, Pablo

doi:10.1109/TAFFC.2022.3158234

Weakly-supervised Learning for Fine-grained Emotion Recognition using Physiological Signals

Title

Weakly-supervised Learning for Fine-grained Emotion Recognition using Physiological Signals

Author

Zhang, T. (TU Delft Multimedia Computing; Centrum Wiskunde & Informatica (CWI))
El Ali, Abdallah (Centrum Wiskunde & Informatica (CWI))
Wang, Chen (Xinhua News Agency, Beijing)
Hanjalic, A. (TU Delft Intelligent Systems)
Cesar, Pablo (TU Delft Multimedia Computing; Centrum Wiskunde & Informatica (CWI))

Department

Intelligent Systems

Date

2023

Abstract

Instead of predicting just one emotion for one activity (e.g., video watching), fine-grained emotion recognition enables more temporally precise recognition. Previous works on fine-grained emotion recognition require segment-by-segment, fine-grained emotion labels to train the recognition algorithm. However, experiments to collect these labels are costly and time-consuming compared with only collecting one emotion label after the user watched that stimulus (i.e., the post-stimuli emotion labels). To recognize emotions at a finer granularity level when trained with only post-stimuli labels, we propose an emotion recognition algorithm based on Deep Multiple Instance Learning (EDMIL) using physiological signals. EDMIL recognizes fine-grained valence and arousal (V-A) labels by identifying which instances represent the post-stimuli V-A annotated by users after watching the videos. Instead of fully-supervised training, the instances are weakly-supervised by the post-stimuli labels in the training stage. The V-A of instances are estimated by the instance gains, which indicate the probability of instances to predict the post-stimuli labels. We tested EDMIL on three different datasets, CASE, MERCA and CEAP-360VR, collected in three different environments: desktop, mobile and HMD-based Virtual Reality, respectively. Recognition results validated with the fine-grained V-A self-reports show that for subject-independent 3-class classification (high/neutral/low), EDMIL obtains promising recognition accuracies: 75.63% and 79.73% for V-A on CASE, 70.51% and 67.62% for V-A on MERCA and 65.04% and 67.05% for V-A on CEAP-360VR. Our ablation study shows that all components of EDMIL contribute to both the classification and regression tasks. Our experiments also show that (1) compared with fully-supervised learning, weakly-supervised learning can reduce the problem of overfitting caused by the temporal mismatch between fine-grained annotations and physiological signals, (2) instance segment lengths between 1-2 s result in the highest recognition accuracies and (3) EDMIL performs best if post-stimuli annotations consist of less than 30% or more than 60% of the entire video watching.

Subject

Annotations
deep multiple instance learning
Emotion recognition
emotion recognition
Feature extraction
physiological signals
Physiology
Solid modeling
Task analysis
temporal ambiguity
Training

To reference this document use:

http://resolver.tudelft.nl/uuid:e978f7eb-09db-4c1a-a897-48dacb1bf57d

DOI

https://doi.org/10.1109/TAFFC.2022.3158234

Embargo date

2023-10-05

ISSN

1949-3045

Source

IEEE Transactions on Affective Computing, 14 (3), 2304-2322

Bibliographical note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Part of collection

Institutional Repository

Document type

journal article

Rights

Files

PDF

Weakly_Supervised_Learnin ... ignals.pdf

1.56 MB

Close viewer