Pruning Self-Attentions Into Convolutional Layers in Single Path
Vision Transformers (ViTs) have achieved impressive performance over various computer vision tasks. However, modeling global correlations with multi-head self-attention (MSA) layers leads to two widely recognized issues: the massive computational resource consumption and the lack of intrinsic induct...
Ausführliche Beschreibung
Bibliographische Detailangaben
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - 46(2024), 5 vom: 05. Mai, Seite 3910-3922
|
1. Verfasser: |
He, Haoyu
(VerfasserIn) |
Weitere Verfasser: |
Cai, Jianfei,
Liu, Jing,
Pan, Zizheng,
Zhang, Jing,
Tao, Dacheng,
Zhuang, Bohan |
Format: | Online-Aufsatz
|
Sprache: | English |
Veröffentlicht: |
2024
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on pattern analysis and machine intelligence
|
Schlagworte: | Journal Article |