Learning Recurrent Memory Activation Networks for Visual Tracking

Facilitated by deep neural networks, numerous tracking methods have made significant advances. Existing deep trackers mainly utilize independent frames to model the target appearance, while paying less attention to its temporal coherence. In this paper, we propose a recurrent memory activation netwo...

Description complète

Détails bibliographiques
Publié dans:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 30(2021) vom: 01., Seite 725-738
Auteur principal: Pu, Shi (Auteur)
Autres auteurs: Song, Yibing, Ma, Chao, Zhang, Honggang, Yang, Ming-Hsuan
Format: Article en ligne
Langue:English
Publié: 2021
Accès à la collection:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Sujets:Journal Article
LEADER 01000caa a22002652c 4500
001 NLM317989758
003 DE-627
005 20250228103613.0
007 cr uuu---uuuuu
008 231225s2021 xx |||||o 00| ||eng c
024 7 |a 10.1109/TIP.2020.3038356  |2 doi 
028 5 2 |a pubmed25n1059.xml 
035 |a (DE-627)NLM317989758 
035 |a (NLM)33232231 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Pu, Shi  |e verfasserin  |4 aut 
245 1 0 |a Learning Recurrent Memory Activation Networks for Visual Tracking 
264 1 |c 2021 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 07.12.2020 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Facilitated by deep neural networks, numerous tracking methods have made significant advances. Existing deep trackers mainly utilize independent frames to model the target appearance, while paying less attention to its temporal coherence. In this paper, we propose a recurrent memory activation network (RMAN) to exploit the untapped temporal coherence of the target appearance for visual tracking. We build the RMAN on top of the long short-term memory network (LSTM) with an additional memory activation layer. Specifically, we first use the LSTM to model the temporal changes of the target appearance. Then we selectively activate the memory blocks via the activation layer to produce a temporally coherent representation. The recurrent memory activation layer enriches the target representations from independent frames and reduces the background interference through temporal consistency. The proposed RMAN is fully differentiable and can be optimized end-to-end. To facilitate network training, we propose a temporal coherence loss together with the original binary classification loss. Extensive experimental results on standard benchmarks demonstrate that our method performs favorably against the state-of-the-art approaches 
650 4 |a Journal Article 
700 1 |a Song, Yibing  |e verfasserin  |4 aut 
700 1 |a Ma, Chao  |e verfasserin  |4 aut 
700 1 |a Zhang, Honggang  |e verfasserin  |4 aut 
700 1 |a Yang, Ming-Hsuan  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society  |d 1992  |g 30(2021) vom: 01., Seite 725-738  |w (DE-627)NLM09821456X  |x 1941-0042  |7 nnas 
773 1 8 |g volume:30  |g year:2021  |g day:01  |g pages:725-738 
856 4 0 |u http://dx.doi.org/10.1109/TIP.2020.3038356  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 30  |j 2021  |b 01  |h 725-738