MTMamba++ : Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders

Multi-task dense scene understanding, which trains a model for multiple dense prediction tasks, has a wide range of application scenarios. Capturing long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba++, a novel ar...

Description complète

Détails bibliographiques
Publié dans:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 47(2025), 11 vom: 28. Okt., Seite 10633-10645
Auteur principal: Lin, Baijiong (Auteur)
Autres auteurs: Jiang, Weisen, Chen, Pengguang, Liu, Shu, Chen, Ying-Cong
Format: Article en ligne
Langue:English
Publié: 2025
Accès à la collection:IEEE transactions on pattern analysis and machine intelligence
Sujets:Journal Article
LEADER 01000caa a22002652c 4500
001 NLM393512487
003 DE-627
005 20251007231851.0
007 cr uuu---uuuuu
008 251003s2025 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2025.3593621  |2 doi 
028 5 2 |a pubmed25n1591.xml 
035 |a (DE-627)NLM393512487 
035 |a (NLM)40729720 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Lin, Baijiong  |e verfasserin  |4 aut 
245 1 0 |a MTMamba++  |b Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders 
264 1 |c 2025 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 06.10.2025 
500 |a published: Print 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Multi-task dense scene understanding, which trains a model for multiple dense prediction tasks, has a wide range of application scenarios. Capturing long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba++, a novel architecture for multi-task scene understanding featuring with a Mamba-based decoder. It contains two types of core blocks: self-task Mamba (STM) block and cross-task Mamba (CTM) block. STM handles long-range dependency by leveraging state-space models, while CTM explicitly models task interactions to facilitate information exchange across tasks. We design two types of CTM block, namely F-CTM and S-CTM, to enhance cross-task interaction from feature and semantic perspectives, respectively. Extensive experiments on NYUDv2, PASCAL-Context, and Cityscapes datasets demonstrate the superior performance of MTMamba++ over CNN-based, Transformer-based, and diffusion-based methods while maintaining high computational efficiency 
650 4 |a Journal Article 
700 1 |a Jiang, Weisen  |e verfasserin  |4 aut 
700 1 |a Chen, Pengguang  |e verfasserin  |4 aut 
700 1 |a Liu, Shu  |e verfasserin  |4 aut 
700 1 |a Chen, Ying-Cong  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g 47(2025), 11 vom: 28. Okt., Seite 10633-10645  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnas 
773 1 8 |g volume:47  |g year:2025  |g number:11  |g day:28  |g month:10  |g pages:10633-10645 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2025.3593621  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 47  |j 2025  |e 11  |b 28  |c 10  |h 10633-10645