Corrnet

Fine-grained emotion recognition for video watching using wearable physiological sensors

Journal Article (2020)
Author(s)

Tianyi Zhang (TU Delft - Multimedia Computing, Centrum Wiskunde & Informatica (CWI))

Abdallah El Ali (Centrum Wiskunde & Informatica (CWI))

Chen Wang (Xinhua News Agency, Beijing)

Alan Hanjalic (TU Delft - Intelligent Systems)

Pablo Cesar (TU Delft - Multimedia Computing, Centrum Wiskunde & Informatica (CWI))

Research Group
Multimedia Computing
DOI related publication
https://doi.org/10.3390/s21010052
More Info
expand_more
Publication Year
2020
Language
English
Research Group
Multimedia Computing
Issue number
1
Volume number
21
Pages (from-to)
1-25
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Recognizing user emotions while they watch short-form videos anytime and anywhere is essential for facilitating video content customization and personalization. However, most works either classify a single emotion per video stimuli, or are restricted to static, desktop environments. To address this, we propose a correlation-based emotion recognition algorithm (CorrNet) to recognize the valence and arousal (V-A) of each instance (fine-grained segment of signals) using only wearable, physiological signals (e.g., electrodermal activity, heart rate). CorrNet takes advantage of features both inside each instance (intra-modality features) and between different instances for the same video stimuli (correlation-based features). We first test our approach on an indoor-desktop affect dataset (CASE), and thereafter on an outdoor-mobile affect dataset (MERCA) which we collected using a smart wristband and wearable eyetracker. Results show that for subject-independent binary classification (high-low), CorrNet yields promising recognition accuracies: 76.37% and 74.03% for V-A on CASE, and 70.29% and 68.15% for V-A on MERCA. Our findings show: (1) instance segment lengths between 1–4 s result in highest recognition accuracies (2) accuracies between laboratory-grade and wearable sensors are comparable, even under low sampling rates (≤64 Hz) (3) large amounts of neu-tral V-A labels, an artifact of continuous affect annotation, result in varied recognition performance.