Prompt-Based Modality Alignment for Effective Multi-Modal Object Re-Identification

A critical challenge for multi-modal Object Re-Identification (ReID) is the effective aggregation of complementary information to mitigate illumination issues. State-of-the-art methods typically employ complex and highly-coupled architectures, which unavoidably result in heavy computational costs. M...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 34(2025) vom: 05., Seite 2450-2462
1. Verfasser:	Zhang, Shizhou (VerfasserIn)
Weitere Verfasser:	Luo, Wenlong, Cheng, De, Xing, Yinghui, Liang, Guoqiang, Wang, Peng, Zhang, Yanning
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2025
Zugriff auf das übergeordnete Werk:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:	Journal Article


LEADER	01000caa a22002652c 4500
001	NLM386681643
003	DE-627
005	20250509185820.0
007	cr uuu---uuuuu
008	250508s2025 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TIP.2025.3556531 \|2 doi
028	5	2	\|a pubmed25n1396.xml
035			\|a (DE-627)NLM386681643
035			\|a (NLM)40193270
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Zhang, Shizhou \|e verfasserin \|4 aut
245	1	0	\|a Prompt-Based Modality Alignment for Effective Multi-Modal Object Re-Identification
264		1	\|c 2025
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Revised 05.05.2025
500			\|a published: Print-Electronic
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a A critical challenge for multi-modal Object Re-Identification (ReID) is the effective aggregation of complementary information to mitigate illumination issues. State-of-the-art methods typically employ complex and highly-coupled architectures, which unavoidably result in heavy computational costs. Moreover, the significant distribution gap among different image spectra hinders the joint representation of multi-modal features. In this paper, we propose a framework named as PromptMA to establish effective communication channels between different modality paths, thereby aggregating modal complementary information and bridging the distribution gap. Specifically, we inject a series of learnable multi-modal prompts into the Image Encoder and introduce a prompt exchange mechanism to enable the prompts to alternately interact with different modal token embeddings, thus capturing and distributing multi-modal features effectively. Building on top of the multi-modal prompts, we further propose Prompt-based Token Selection (PBTS) and Prompt-based Modality Fusion (PBMF) modules to achieve effective multi-modal feature fusion while minimizing background interference. Additionally, due to the flexibility of our prompt exchange mechanism, our method is well-suited to handle scenarios with missing modalities. Extensive evaluations are conducted on four widely used benchmark datasets and the experimental results demonstrate that our method achieves state-of-the-art performances, surpassing the current benchmarks by over 15% on the challenging MSVR310 dataset and by 6% on the RGBNT201. The code is available at https://github.com/FHR-L/PromptMA
650		4	\|a Journal Article
700	1		\|a Luo, Wenlong \|e verfasserin \|4 aut
700	1		\|a Cheng, De \|e verfasserin \|4 aut
700	1		\|a Xing, Yinghui \|e verfasserin \|4 aut
700	1		\|a Liang, Guoqiang \|e verfasserin \|4 aut
700	1		\|a Wang, Peng \|e verfasserin \|4 aut
700	1		\|a Zhang, Yanning \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society \|d 1992 \|g 34(2025) vom: 05., Seite 2450-2462 \|w (DE-627)NLM09821456X \|x 1941-0042 \|7 nnas
773	1	8	\|g volume:34 \|g year:2025 \|g day:05 \|g pages:2450-2462
856	4	0	\|u http://dx.doi.org/10.1109/TIP.2025.3556531 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 34 \|j 2025 \|b 05 \|h 2450-2462