Pruning Self-Attentions Into Convolutional Layers in Single Path

Vision Transformers (ViTs) have achieved impressive performance over various computer vision tasks. However, modeling global correlations with multi-head self-attention (MSA) layers leads to two widely recognized issues: the massive computational resource consumption and the lack of intrinsic induct...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 46(2024), 5 vom: 04. Apr., Seite 3910-3922
1. Verfasser: He, Haoyu (VerfasserIn)
Weitere Verfasser: Cai, Jianfei, Liu, Jing, Pan, Zizheng, Zhang, Jing, Tao, Dacheng, Zhuang, Bohan
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article