How Does Attention Work in Vision Transformers? A Visual Analytics Attempt

Vision transformer (ViT) expands the success of transformer models from sequential data to images. The model decomposes an image into many smaller patches and arranges them into a sequence. Multi-head self-attentions are then applied to the sequence to learn the attention between patches. Despite ma...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on visualization and computer graphics. - 1996. - 29(2023), 6 vom: 05. Juni, Seite 2888-2900
1. Verfasser:	Li, Yiran (VerfasserIn)
Weitere Verfasser:	Wang, Junpeng, Dai, Xin, Wang, Liang, Yeh, Chin-Chia Michael, Zheng, Yan, Zhang, Wei, Ma, Kwan-Liu
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2023
Zugriff auf das übergeordnete Werk:	IEEE transactions on visualization and computer graphics
Schlagworte:	Journal Article

Online verfügbar	Volltext