A Swiss Army Knife for Tracking by Natural Language Specification

Tracking by natural language specification requires trackers to jointly perform grounding and tracking tasks. Existing methods either use separate models or a single shared network, failing to account for the link and diversity between tasks jointly. In this paper, we propose a novel framework that...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society. - 1992. - 34(2025) vom: 16., Seite 2254-2268
1. Verfasser: Mao, Kaige (VerfasserIn)
Weitere Verfasser: Hong, Xiaopeng, Fan, Xiaopeng, Zuo, Wangmeng
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2025
Zugriff auf das übergeordnete Werk:IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Schlagworte:Journal Article
LEADER 01000caa a22002652c 4500
001 NLM387000577
003 DE-627
005 20250509123936.0
007 cr uuu---uuuuu
008 250508s2025 xx |||||o 00| ||eng c
024 7 |a 10.1109/TIP.2025.3553290  |2 doi 
028 5 2 |a pubmed25n1376.xml 
035 |a (DE-627)NLM387000577 
035 |a (NLM)40168206 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Mao, Kaige  |e verfasserin  |4 aut 
245 1 2 |a A Swiss Army Knife for Tracking by Natural Language Specification 
264 1 |c 2025 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 16.04.2025 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Tracking by natural language specification requires trackers to jointly perform grounding and tracking tasks. Existing methods either use separate models or a single shared network, failing to account for the link and diversity between tasks jointly. In this paper, we propose a novel framework that performs dynamic task switching to customize its network path routing for each task within a unified model. For this purpose, we design a task-switchable attention module, which enables the acquisition of modal relation patterns with different dominant modalities for each task via dynamic task switching. In addition, to alleviate the inconsistency between the static language description and the dynamic target appearance during tracking, we propose a language renovation mechanism that renovates the initial language online via visual-context-aware linguistic prompting. Extensive experimental results on five datasets demonstrate that the proposed method performs favorably against state-of-the-art approaches for both grounding and tracking. Our project will be available at: https://github.com/mkg1204/SAKTrack 
650 4 |a Journal Article 
700 1 |a Hong, Xiaopeng  |e verfasserin  |4 aut 
700 1 |a Fan, Xiaopeng  |e verfasserin  |4 aut 
700 1 |a Zuo, Wangmeng  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on image processing : a publication of the IEEE Signal Processing Society  |d 1992  |g 34(2025) vom: 16., Seite 2254-2268  |w (DE-627)NLM09821456X  |x 1941-0042  |7 nnas 
773 1 8 |g volume:34  |g year:2025  |g day:16  |g pages:2254-2268 
856 4 0 |u http://dx.doi.org/10.1109/TIP.2025.3553290  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 34  |j 2025  |b 16  |h 2254-2268