Graph Convolution-Based Decoupling and Consistency-Driven Fusion for Multimodal Emotion Recognition

Journal Article (2025)
Author(s)

Yingmin Deng (Xidian University)

Chenyu Li (Xidian University)

Yu Gu (Xidian University)

He Zhang (Northwest University)

Linsong Liu (Xidian University)

Haixiang Lin (TU Delft - Mathematical Physics)

Shuang Wang (Xidian University)

Hanlin Mo (Xidian University)

Research Group
Mathematical Physics
DOI related publication
https://doi.org/10.3390/electronics14153047
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Mathematical Physics
Journal title
Electronics (Switzerland)
Issue number
15
Volume number
14
Article number
3047
Downloads counter
115
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Multimodal emotion recognition (MER) is essential for understanding human emotions from diverse sources such as speech, text, and video. However, modality heterogeneity and inconsistent expression pose challenges for effective feature fusion. To address this, we propose a novel MER framework combining a Dynamic Weighted Graph Convolutional Network (DW-GCN) for feature disentanglement and a Cross-Attention Consistency-Gated Fusion (CACG-Fusion) module for robust integration. DW-GCN models complex inter-modal relationships, enabling the extraction of both common and private features. The CACG-Fusion module subsequently enhances classification performance through dynamic alignment of cross-modal cues, employing attention-based coordination and consistency-preserving gating mechanisms to optimize feature integration. Experiments on the CMU-MOSI and CMU-MOSEI datasets demonstrate that our method achieves state-of-the-art performance, significantly improving the ๐ด๐ถ๐ถ7 , ๐ด๐ถ๐ถ2, and ๐น1 scores.