Multi-Stage Image-Language Cross-Generative Fusion Network for Video-Based Referring Expression Comprehension
Video-based referring expression comprehension is a challenging task that requires locating the referred object in each video frame of a given video. While many existing approaches treat this task as an object-tracking problem, their performance is heavily reliant on the quality of the tracking temp...
Veröffentlicht in: | IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 33(2024) vom: 01., Seite 3256-3270 |
---|---|
1. Verfasser: | |
Weitere Verfasser: | , , , |
Format: | Online-Aufsatz |
Sprache: | English |
Veröffentlicht: |
2024
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on image processing : a publication of the IEEE Signal Processing Society |
Schlagworte: | Journal Article |
Online verfügbar |
Volltext |