Contrastive Transformer Hashing for Compact Video Representation

Video hashing learns compact representation by mapping video into low-dimensional Hamming space and has achieved promising performance in large-scale video retrieval. It is challenging to effectively exploit temporal and spatial structure in an unsupervised setting. To fulfill this gap, this paper p...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 32(2023) vom: 30., Seite 5992-6003
1. Verfasser: Shen, Xiaobo (VerfasserIn)
Weitere Verfasser: Zhou, Yue, Yuan, Yun-Hao, Yang, Xichen, Lan, Long, Zheng, Yuhui
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2023
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article
Beschreibung
Zusammenfassung:Video hashing learns compact representation by mapping video into low-dimensional Hamming space and has achieved promising performance in large-scale video retrieval. It is challenging to effectively exploit temporal and spatial structure in an unsupervised setting. To fulfill this gap, this paper proposes Contrastive Transformer Hashing (CTH) for effective video retrieval. Specifically, CTH develops a bidirectional transformer autoencoder, based on which visual reconstruction loss is proposed. CTH is more powerful to capture bidirectional correlations among frames than conventional unidirectional models. In addition, CTH devises multi-modality contrastive loss to reveal intrinsic structure among videos. CTH constructs inter-modality and intra-modality triplet sets and proposes multi-modality contrastive loss to exploit inter-modality and intra-modality similarities simultaneously. We perform video retrieval tasks on four benchmark datasets, i.e., UCF101, HMDB51, SVW30, FCVID using the learned compact hash representation, and extensive empirical results demonstrate the proposed CTH outperforms several state-of-the-art video hashing methods
Beschreibung:Date Revised 07.11.2023
published: Print-Electronic
Citation Status PubMed-not-MEDLINE
ISSN:1941-0042
DOI:10.1109/TIP.2023.3326994