Pruning Self-Attentions Into Convolutional Layers in Single Path

Vision Transformers (ViTs) have achieved impressive performance over various computer vision tasks. However, modeling global correlations with multi-head self-attention (MSA) layers leads to two widely recognized issues: the massive computational resource consumption and the lack of intrinsic induct...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 46(2024), 5 vom: 05. Mai, Seite 3910-3922
1. Verfasser:	He, Haoyu (VerfasserIn)
Weitere Verfasser:	Cai, Jianfei, Liu, Jing, Pan, Zizheng, Zhang, Jing, Tao, Dacheng, Zhuang, Bohan
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2024
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article

Online verfügbar	Volltext