CycleACR : Cycle Modeling of Actor-Context Relations for Video Action Detection

The relation modeling between actors and scene context advances video action detection where the correlation of multiple actors makes their action recognition challenging. Existing studies model each actor and scene relation to improve action recognition. However, the scene variations and background...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 47(2025), 11 vom: 01. Okt., Seite 10588-10603
1. Verfasser: Chen, Lei (VerfasserIn)
Weitere Verfasser: Tong, Zhan, Song, Yibing, Wu, Gangshan, Wang, Limin
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2025
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article
Beschreibung
Zusammenfassung:The relation modeling between actors and scene context advances video action detection where the correlation of multiple actors makes their action recognition challenging. Existing studies model each actor and scene relation to improve action recognition. However, the scene variations and background interference limit their effectiveness. In this paper, we propose to select actor-related scene context, rather than directly laveraging raw video scenario, to improve relation modeling. We develop a Cycle Actor-Context Relation network (CycleACR) where there is a symmetric graph that models the actor and context relations in a bidirectional form. Specifically, our CycleACR is constituted of two modules: 1) Actor-to-Context Reorganization (A2C-R), which adaptively collects actor features for context feature reorganizations, and 2) Context-to-Actor Enhancement (C2A-E), which dynamically utilizes the reorganized context features for actor feature enhancement. Stacking multiple CycleACR modules is able to effectively capture the high-order relation and efficiently exchange useful information between actors and context. To fully exploit time-dependent and holistic context information, we further design a parallel local and global temporal context modeling branch. The outputs of the two branches are integrated as the final context-enhanced actor feature representations. Finally, we propose a context-aware memory bank for long-term relation modeling. The proposed bank can effectively store actor-related scene context from other clips without additional memory overhead. Compared to existing designs that focus on C2A-E, our CycleACR introduces the core design of A2C-R for more effective relation modeling. This cycle modeling enablesour CycleACR to achieve state-of-the-art performance on two popular action detection datasets: AVA (40.6 mAP) and UCF101-24 (84.7 mAP). We also provide ablation studies and visualizations to show how our cycle actor-context relation modeling improves video action detection
Beschreibung:Date Revised 06.10.2025
published: Print
Citation Status PubMed-not-MEDLINE
ISSN:1939-3539
DOI:10.1109/TPAMI.2025.3595393