VOLO : Vision Outlooker for Visual Recognition

Recently, Vision Transformers (ViTs) have been broadly explored in visual recognition. With low efficiency in encoding fine-level features, the performance of ViTs is still inferior to the state-of-the-art CNNs when trained from scratch on a midsize dataset like ImageNet. Through experimental analys...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 45(2023), 5 vom: 12. Mai, Seite 6575-6586
1. Verfasser:	Yuan, Li (VerfasserIn)
Weitere Verfasser:	Hou, Qibin, Jiang, Zihang, Feng, Jiashi, Yan, Shuicheng
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2023
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article

Online verfügbar	Volltext