Print Email Facebook Twitter Weakly-supervised Learning for Fine-grained Emotion Recognition using Physiological Signals Title Weakly-supervised Learning for Fine-grained Emotion Recognition using Physiological Signals Author Zhang, T. (TU Delft Multimedia Computing; Centrum Wiskunde & Informatica (CWI)) El Ali, Abdallah (Centrum Wiskunde & Informatica (CWI)) Wang, Chen (Xinhua News Agency, Beijing) Hanjalic, A. (TU Delft Intelligent Systems) Cesar, Pablo (TU Delft Multimedia Computing; Centrum Wiskunde & Informatica (CWI)) Department Intelligent Systems Date 2023 Abstract Instead of predicting just one emotion for one activity (e.g., video watching), fine-grained emotion recognition enables more temporally precise recognition. Previous works on fine-grained emotion recognition require segment-by-segment, fine-grained emotion labels to train the recognition algorithm. However, experiments to collect these labels are costly and time-consuming compared with only collecting one emotion label after the user watched that stimulus (i.e., the post-stimuli emotion labels). To recognize emotions at a finer granularity level when trained with only post-stimuli labels, we propose an emotion recognition algorithm based on Deep Multiple Instance Learning (EDMIL) using physiological signals. EDMIL recognizes fine-grained valence and arousal (V-A) labels by identifying which instances represent the post-stimuli V-A annotated by users after watching the videos. Instead of fully-supervised training, the instances are weakly-supervised by the post-stimuli labels in the training stage. The V-A of instances are estimated by the instance gains, which indicate the probability of instances to predict the post-stimuli labels. We tested EDMIL on three different datasets, CASE, MERCA and CEAP-360VR, collected in three different environments: desktop, mobile and HMD-based Virtual Reality, respectively. Recognition results validated with the fine-grained V-A self-reports show that for subject-independent 3-class classification (high/neutral/low), EDMIL obtains promising recognition accuracies: 75.63% and 79.73% for V-A on CASE, 70.51% and 67.62% for V-A on MERCA and 65.04% and 67.05% for V-A on CEAP-360VR. Our ablation study shows that all components of EDMIL contribute to both the classification and regression tasks. Our experiments also show that (1) compared with fully-supervised learning, weakly-supervised learning can reduce the problem of overfitting caused by the temporal mismatch between fine-grained annotations and physiological signals, (2) instance segment lengths between 1-2 s result in the highest recognition accuracies and (3) EDMIL performs best if post-stimuli annotations consist of less than 30% or more than 60% of the entire video watching. Subject Annotationsdeep multiple instance learningEmotion recognitionemotion recognitionFeature extractionphysiological signalsPhysiologySolid modelingTask analysistemporal ambiguityTraining To reference this document use: http://resolver.tudelft.nl/uuid:e978f7eb-09db-4c1a-a897-48dacb1bf57d DOI https://doi.org/10.1109/TAFFC.2022.3158234 Embargo date 2023-10-05 ISSN 1949-3045 Source IEEE Transactions on Affective Computing, 14 (3), 2304-2322 Bibliographical note Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. Part of collection Institutional Repository Document type journal article Rights © 2023 T. Zhang, Abdallah El Ali, Chen Wang, A. Hanjalic, Pablo Cesar Files PDF Weakly_Supervised_Learnin ... ignals.pdf 1.56 MB Close viewer /islandora/object/uuid:e978f7eb-09db-4c1a-a897-48dacb1bf57d/datastream/OBJ/view