Multi-Modal Mutual Attention and Iterative Interaction for Referring Image Segmentation

We address the problem of referring image segmentation that aims to generate a mask for the object specified by a natural language expression. Many recent works utilize Transformer to extract features for the target object by aggregating the attended visual regions. However, the generic attention me...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 32(2023) vom: 23., Seite 3054-3065
1. Verfasser: Liu, Chang (VerfasserIn)
Weitere Verfasser: Ding, Henghui, Zhang, Yulun, Jiang, Xudong
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2023
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article