|
|
|
|
| LEADER |
01000caa a22002652c 4500 |
| 001 |
NLM392755874 |
| 003 |
DE-627 |
| 005 |
20251001232128.0 |
| 007 |
cr uuu---uuuuu |
| 008 |
250920s2025 xx |||||o 00| ||eng c |
| 024 |
7 |
|
|a 10.1109/TIP.2025.3608664
|2 doi
|
| 028 |
5 |
2 |
|a pubmed25n1586.xml
|
| 035 |
|
|
|a (DE-627)NLM392755874
|
| 035 |
|
|
|a (NLM)40966155
|
| 040 |
|
|
|a DE-627
|b ger
|c DE-627
|e rakwb
|
| 041 |
|
|
|a eng
|
| 100 |
1 |
|
|a Shi, QingHongYa
|e verfasserin
|4 aut
|
| 245 |
1 |
0 |
|a Gradient and Structure Consistency in Multimodal Emotion Recognition
|
| 264 |
|
1 |
|c 2025
|
| 336 |
|
|
|a Text
|b txt
|2 rdacontent
|
| 337 |
|
|
|a ƒaComputermedien
|b c
|2 rdamedia
|
| 338 |
|
|
|a ƒa Online-Ressource
|b cr
|2 rdacarrier
|
| 500 |
|
|
|a Date Completed 29.09.2025
|
| 500 |
|
|
|a Date Revised 30.09.2025
|
| 500 |
|
|
|a published: Print
|
| 500 |
|
|
|a Citation Status MEDLINE
|
| 520 |
|
|
|a Multimodal emotion recognition is a task that integrates textual, visual, and audio data to holistically infer an individual's emotional state. Existing research predominantly focuses on exploiting modality-specific cues for joint learning, often ignoring the differences between multiple modalities in common goal learning. Due to multimodal heterogeneity, common goal learning inadvertently introduces optimization biases and interaction noise. To address above challenges, we propose a novel approach named Gradient and Structure Consistency (GSCon). Our strategy operates at both overall and individual levels to consider balance optimization and effective interaction respectively. At the overall level, to avoid the optimization suppression of one modality on others, we construct a balanced gradient direction that aligns each modality's optimization direction, ensuring unbiased convergence. Simultaneously, at the individual level, to avoid the interaction noise caused by multimodal alignment, we align the spatial structure of samples in different modalities. The spatial structure of the samples will not differ due to modal heterogeneity, achieving effective inter-modal interaction. Extensive experiments on multimodal emotion recognition and multimodal intention understanding datasets demonstrate the effectiveness of the proposed method. Code is available at https://github.com/ShiQingHongYa/GSCon
|
| 650 |
|
4 |
|a Journal Article
|
| 700 |
1 |
|
|a Ye, Mang
|e verfasserin
|4 aut
|
| 700 |
1 |
|
|a Huang, Wenke
|e verfasserin
|4 aut
|
| 700 |
1 |
|
|a Du, Bo
|e verfasserin
|4 aut
|
| 700 |
1 |
|
|a Zong, Xiaofen
|e verfasserin
|4 aut
|
| 773 |
0 |
8 |
|i Enthalten in
|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
|d 1992
|g 34(2025) vom: 01., Seite 6180-6191
|w (DE-627)NLM09821456X
|x 1941-0042
|7 nnas
|
| 773 |
1 |
8 |
|g volume:34
|g year:2025
|g day:01
|g pages:6180-6191
|
| 856 |
4 |
0 |
|u http://dx.doi.org/10.1109/TIP.2025.3608664
|3 Volltext
|
| 912 |
|
|
|a GBV_USEFLAG_A
|
| 912 |
|
|
|a SYSFLAG_A
|
| 912 |
|
|
|a GBV_NLM
|
| 912 |
|
|
|a GBV_ILN_350
|
| 951 |
|
|
|a AR
|
| 952 |
|
|
|d 34
|j 2025
|b 01
|h 6180-6191
|