Siamese Implicit Region Proposal Network With Compound Attention for Visual Tracking

Recently, siamese-based trackers have achieved significant successes. However, those trackers are restricted by the difficulty of learning consistent feature representation with the object. To address the above challenge, this paper proposes a novel siamese implicit region proposal network with comp...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 31(2022) vom: 01., Seite 1882-1894
1. Verfasser: Chan, Sixian (VerfasserIn)
Weitere Verfasser: Tao, Jian, Zhou, Xiaolong, Bai, Cong, Zhang, Xiaoqin
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2022
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article
Beschreibung
Zusammenfassung:Recently, siamese-based trackers have achieved significant successes. However, those trackers are restricted by the difficulty of learning consistent feature representation with the object. To address the above challenge, this paper proposes a novel siamese implicit region proposal network with compound attention for visual tracking. First, an implicit region proposal (IRP) module is designed by combining a novel pixel-wise correlation method. This module can aggregate feature information of different regions that are similar to the pre-defined anchor boxes in Region Proposal Network. To this end, the adaptive feature receptive fields then can be obtained by linear fusion of features from different regions. Second, a compound attention module including a channel and non-local attention is raised to assist the IRP module to perform a better perception of the scale and shape of the object. The channel attention is applied for mining the discriminative information of the object to handle the background clutters of the template, while non-local attention is trained to aggregate the contextual information to learn the semantic range of the object. Finally, experimental results demonstrate that the proposed tracker achieves state-of-the-art performance on six challenging benchmark tests, including VOT-2018, VOT-2019, OTB-100, GOT-10k, LaSOT, and TrackingNet. Further, our obtained results demonstrate that the proposed approach can be run at an average speed of 72 FPS in real time
Beschreibung:Date Completed 18.02.2022
Date Revised 18.02.2022
published: Print-Electronic
Citation Status MEDLINE
ISSN:1941-0042
DOI:10.1109/TIP.2022.3148876