Multilingual Text-to-Image Person Retrieval via Bidirectional Relation Reasoning and Aligning
Text-to-image person retrieval (TIPR) aims to identify the target person using textual descriptions, facing challenge in modality heterogeneity. Prior works have attempted to address it by developing cross-modal global or local alignment strategies. However, global methods typically overlook fine-gr...
| Publié dans: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - PP(2025) vom: 10. Okt. |
|---|---|
| Auteur principal: | |
| Autres auteurs: | , , , , |
| Format: | Article en ligne |
| Langue: | English |
| Publié: |
2025
|
| Accès à la collection: | IEEE transactions on pattern analysis and machine intelligence |
| Sujets: | Journal Article |
| Résumé: | Text-to-image person retrieval (TIPR) aims to identify the target person using textual descriptions, facing challenge in modality heterogeneity. Prior works have attempted to address it by developing cross-modal global or local alignment strategies. However, global methods typically overlook fine-grained cross-modal differences, whereas local methods require prior information to explore explicit part alignments. Additionally, current methods are English-centric, restricting their application in multilingual contexts. To alleviate these issues, we pioneer a multilingual TIPR task by developing a multilingual TIPR benchmark, for which we leverage large language models for initial translations and refine them by integrating domain-specific knowledge. Correspondingly, we propose Bi-IRRA: a Bidirectional Implicit Relation Reasoning and Aligning framework to learn alignment across languages and modalities. Within Bi-IRRA, a bidirectional implicit relation reasoning module enables bidirectional prediction of masked image and text, implicitly enhancing the modeling of local relations across languages and modalities, a multi-dimensional global alignment module is integrated to bridge the modality heterogeneity. The proposed method achieves new state-of-the-art results on all multilingual TIPR datasets. Data and code are presented in https://github.com/Flame-Chasers/Bi-IRRA |
|---|---|
| Description: | Date Revised 10.10.2025 published: Print-Electronic Citation Status Publisher |
| ISSN: | 1939-3539 |
| DOI: | 10.1109/TPAMI.2025.3620139 |