Robust Visual Tracking via Convolutional Networks Without Training

Deep networks have been successfully applied to visual tracking by learning a generic representation offline from numerous training images. However, the offline training is time-consuming and the learned generic representation may be less discriminative for tracking specific objects. In this paper,...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 25(2016), 4 vom: 20. Apr., Seite 1779-92
1. Verfasser: Kaihua Zhang (VerfasserIn)
Weitere Verfasser: Qingshan Liu, Yi Wu, Ming-Hsuan Yang
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2016
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article Research Support, Non-U.S. Gov't
LEADER 01000naa a22002652 4500
001 NLM257609059
003 DE-627
005 20231224183211.0
007 cr uuu---uuuuu
008 231224s2016 xx |||||o 00| ||eng c
024 7 |a 10.1109/TIP.2016.2531283  |2 doi 
028 5 2 |a pubmed24n0858.xml 
035 |a (DE-627)NLM257609059 
035 |a (NLM)26890870 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Kaihua Zhang  |e verfasserin  |4 aut 
245 1 0 |a Robust Visual Tracking via Convolutional Networks Without Training 
264 1 |c 2016 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Completed 20.07.2016 
500 |a Date Revised 14.07.2016 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Deep networks have been successfully applied to visual tracking by learning a generic representation offline from numerous training images. However, the offline training is time-consuming and the learned generic representation may be less discriminative for tracking specific objects. In this paper, we present that, even without offline training with a large amount of auxiliary data, simple two-layer convolutional networks can be powerful enough to learn robust representations for visual tracking. In the first frame, we extract a set of normalized patches from the target region as fixed filters, which integrate a series of adaptive contextual filters surrounding the target to define a set of feature maps in the subsequent frames. These maps measure similarities between each filter and useful local intensity patterns across the target, thereby encoding its local structural information. Furthermore, all the maps together form a global representation, via which the inner geometric layout of the target is also preserved. A simple soft shrinkage method that suppresses noisy values below an adaptive threshold is employed to de-noise the global representation. Our convolutional networks have a lightweight structure and perform favorably against several state-of-the-art methods on the recent tracking benchmark data set with 50 challenging videos 
650 4 |a Journal Article 
650 4 |a Research Support, Non-U.S. Gov't 
700 1 |a Qingshan Liu  |e verfasserin  |4 aut 
700 1 |a Yi Wu  |e verfasserin  |4 aut 
700 1 |a Ming-Hsuan Yang  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society  |d 1992  |g 25(2016), 4 vom: 20. Apr., Seite 1779-92  |w (DE-627)NLM09821456X  |x 1941-0042  |7 nnns 
773 1 8 |g volume:25  |g year:2016  |g number:4  |g day:20  |g month:04  |g pages:1779-92 
856 4 0 |u http://dx.doi.org/10.1109/TIP.2016.2531283  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 25  |j 2016  |e 4  |b 20  |c 04  |h 1779-92