Multi-Stage Image-Language Cross-Generative Fusion Network for Video-Based Referring Expression Comprehension

Video-based referring expression comprehension is a challenging task that requires locating the referred object in each video frame of a given video. While many existing approaches treat this task as an object-tracking problem, their performance is heavily reliant on the quality of the tracking temp...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 33(2024) vom: 01., Seite 3256-3270
1. Verfasser:	Zhang, Yujia (VerfasserIn)
Weitere Verfasser:	Li, Qianzhong, Pan, Yi, Zhao, Xiaoguang, Tan, Min
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2024
Zugriff auf das übergeordnete Werk:	IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:	Journal Article

Online verfügbar	Volltext