Multi-Stage Image-Language Cross-Generative Fusion Network for Video-Based Referring Expression Comprehension

Video-based referring expression comprehension is a challenging task that requires locating the referred object in each video frame of a given video. While many existing approaches treat this task as an object-tracking problem, their performance is heavily reliant on the quality of the tracking temp...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 33(2024) vom: 01., Seite 3256-3270
1. Verfasser: Zhang, Yujia (VerfasserIn)
Weitere Verfasser: Li, Qianzhong, Pan, Yi, Zhao, Xiaoguang, Tan, Min
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article