Dual-Space Normalizing Flow for Unsupervised Video Anomaly Detection

Conventional reconstruction-based video anomaly detection (VAD) methods implicitly model normality in latent spaces, which is limited by the generalization ability of latent features. Normalizing Flow (NF)-based methods have been introduced to address this issue, as they explicitly model the distrib...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 34(2025) vom: 01., Seite 6432-6445
1. Verfasser: Leng, Jiaxu (VerfasserIn)
Weitere Verfasser: Zhang, Yumeng, Tan, Mingpi, Kuang, Changjiang, Wu, Zhanjie, Gan, Ji, Gao, Xinbo
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2025
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article
Beschreibung
Zusammenfassung:Conventional reconstruction-based video anomaly detection (VAD) methods implicitly model normality in latent spaces, which is limited by the generalization ability of latent features. Normalizing Flow (NF)-based methods have been introduced to address this issue, as they explicitly model the distribution of input data and achieve significant performance in VAD. However, existing NF-based methods are confined to Euclidean space, limiting their ability to model action hierarchies. While effective at capturing local joint dynamics and short-term temporal variations, they fail to encode kinematic dependencies and long-term pose evolution, ultimately struggling to discern ambiguous anomalies that deviate minimally from normal motion. In contrast, hyperbolic representation learning, with its ability to model hierarchical and complex relationships among actions, offers a promising solution to enhance the discriminative power between similar skeletal actions. Motivated by this, we propose a novel Dual-Space Normalizing Flow (DSNF) method. Specifically, we design a Dual-Space Parallel Graph Convolutional Network (DSPGCN) that synergistically integrates the strengths of both Euclidean and hyperbolic geometries to simultaneously capture local detail features of poses and intrinsic hierarchical relationships of actions. To enhance the model's focus on discriminative features, we design an Adaptive Weighted Approximation Mass (AWAM) loss that dynamically adjusts weights to impose stronger constraints on regions with low discriminability in the dual space, encouraging the model to focus more on key discriminative features in hyperbolic space that reflect complex relationships between actions. Extensive experiments on public datasets demonstrate the effectiveness and robustness of our method in various VAD scenarios
Beschreibung:Date Revised 09.10.2025
published: Print
Citation Status PubMed-not-MEDLINE
ISSN:1941-0042
DOI:10.1109/TIP.2025.3614006