Task-Aware Weakly Supervised Object Localization With Transformer

Weakly supervised object localization (WSOL) aims to predict both object locations and categories with only image-level class labels. However, most existing methods rely on class-specific image regions for localization, resulting in incomplete object localization. To alleviate this problem, we propo...

Description complète

Détails bibliographiques
Publié dans:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 45(2023), 7 vom: 07. Juli, Seite 9109-9121
Auteur principal: Meng, Meng (Auteur)
Autres auteurs: Zhang, Tianzhu, Zhang, Zhe, Zhang, Yongdong, Wu, Feng
Format: Article en ligne
Langue:English
Publié: 2023
Accès à la collection:IEEE transactions on pattern analysis and machine intelligence
Sujets:Journal Article
LEADER 01000caa a22002652c 4500
001 NLM355202816
003 DE-627
005 20250304150744.0
007 cr uuu---uuuuu
008 231226s2023 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2022.3230902  |2 doi 
028 5 2 |a pubmed25n1183.xml 
035 |a (DE-627)NLM355202816 
035 |a (NLM)37015535 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Meng, Meng  |e verfasserin  |4 aut 
245 1 0 |a Task-Aware Weakly Supervised Object Localization With Transformer 
264 1 |c 2023 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Completed 06.06.2023 
500 |a Date Revised 06.06.2023 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Weakly supervised object localization (WSOL) aims to predict both object locations and categories with only image-level class labels. However, most existing methods rely on class-specific image regions for localization, resulting in incomplete object localization. To alleviate this problem, we propose a novel end-to-end task-aware framework with a transformer encoder-decoder architecture (TAFormer) to learn class-agnostic foreground maps, including a representation encoder, a localization decoder, and a classification decoder. The proposed TAFormer enjoys several merits. First, the designed three modules can effectively perform class-agnostic localization and classification in a task-aware manner, achieving remarkable performance for both tasks. Second, an optimal transport algorithm is proposed to provide pixel-level pseudo labels to online refine foreground maps. To the best of our knowledge, this is the first work by exploring a task-aware framework with a transformer architecture and an optimal transport algorithm to achieve accurate object localization for WSOL. Extensive experiments with four backbones on two standard benchmarks demonstrate that our TAFormer achieves favorable performance against state-of-the-art methods. Furthermore, we show that the proposed TAFormer provides higher robustness against adversarial attacks and noisy labels 
650 4 |a Journal Article 
700 1 |a Zhang, Tianzhu  |e verfasserin  |4 aut 
700 1 |a Zhang, Zhe  |e verfasserin  |4 aut 
700 1 |a Zhang, Yongdong  |e verfasserin  |4 aut 
700 1 |a Wu, Feng  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g 45(2023), 7 vom: 07. Juli, Seite 9109-9121  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnas 
773 1 8 |g volume:45  |g year:2023  |g number:7  |g day:07  |g month:07  |g pages:9109-9121 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2022.3230902  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 45  |j 2023  |e 7  |b 07  |c 07  |h 9109-9121