Sequence as a Whole : A Unified Framework for Video Action Localization With Long-Range Text Query

Comprehensive understanding of video content requires both spatial and temporal localization. However, there lacks a unified video action localization framework, which hinders the coordinated development of this field. Existing 3D CNN methods take fixed and limited input length at the cost of ignori...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 32(2023) vom: 04., Seite 1403-1418
1. Verfasser: Su, Yuting (VerfasserIn)
Weitere Verfasser: Wang, Weikang, Liu, Jing, Ma, Shuang, Yang, Xiaokang
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2023
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article