TEREE
Transformer-based emotion recognition using EEG and Eye movement data
Nima Esmi (Khazar University, University Medical Center Groningen)
Asadollah Shahbahrami (University of Guilan, Khazar University)
G. Gaydadjiev (TU Delft - Computer Engineering)
Peter de Jonge (University Medical Center Groningen)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Multimodal AI systems increasingly rely on biomedical signals such as EEG and eye movement data for emotion recognition. However, these models face challenges including limited training data, inter-subject variability, session-specific spurious correlations, and incomplete modality representation, all of which reduce generalization and reliability. We propose TEREE, a multimodal transformer-based model that integrates temporal, spatial, and spectral EEG features with eye movement data. To mitigate session-specific artifacts, Bayesian Spurious Correlation Minimization (BSCM) is applied. In addition, a holistic multimodal processing strategy enables robust handling of incomplete data. The model was trained and evaluated using the SEED and SEED-FRA benchmark datasets under one-to-one and multi-to-one transfer paradigms. TEREE achieved state-of-the-art performance, with average multi-to-one transfer accuracies of 97.7% on SEED and 98.8% on SEED-FRA. Ablation studies confirmed that fusing EEG with eye movement features consistently improved accuracy compared to unimodal baselines. Standard deviations across repeated experiments were below 5%, indicating stability. By addressing inter-subject variability, spurious correlations, and incomplete modality issues, TEREE enhances the robustness and generalization of emotion recognition systems. These findings suggest that multimodal transformer-based models can substantially improve the reliability of affective computing applications such as human–computer interaction and mental health monitoring.