|
|
|
|
LEADER |
01000naa a22002652 4500 |
001 |
NLM363706631 |
003 |
DE-627 |
005 |
20231226093924.0 |
007 |
cr uuu---uuuuu |
008 |
231226s2024 xx |||||o 00| ||eng c |
024 |
7 |
|
|a 10.1109/TPAMI.2023.3327511
|2 doi
|
028 |
5 |
2 |
|a pubmed24n1212.xml
|
035 |
|
|
|a (DE-627)NLM363706631
|
035 |
|
|
|a (NLM)37878436
|
040 |
|
|
|a DE-627
|b ger
|c DE-627
|e rakwb
|
041 |
|
|
|a eng
|
100 |
1 |
|
|a Yuan, Yuhui
|e verfasserin
|4 aut
|
245 |
1 |
0 |
|a Expediting Large-Scale Vision Transformer for Dense Prediction Without Fine-Tuning
|
264 |
|
1 |
|c 2024
|
336 |
|
|
|a Text
|b txt
|2 rdacontent
|
337 |
|
|
|a ƒaComputermedien
|b c
|2 rdamedia
|
338 |
|
|
|a ƒa Online-Ressource
|b cr
|2 rdacarrier
|
500 |
|
|
|a Date Revised 06.12.2023
|
500 |
|
|
|a published: Print-Electronic
|
500 |
|
|
|a Citation Status PubMed-not-MEDLINE
|
520 |
|
|
|a In a wide range of dense prediction tasks, large-scale Vision Transformers have achieved state-of-the-art performance while requiring expensive computation. In contrast to most existing approaches accelerating Vision Transformers for image classification, we focus on accelerating Vision Transformers for dense prediction without any fine-tuning. We present two non-parametric operators specialized for dense prediction tasks, a token clustering layer to decrease the number of tokens for expediting and a token reconstruction layer to increase the number of tokens for recovering high-resolution. To accomplish this, the following steps are taken: i) token clustering layer is employed to cluster the neighboring tokens and yield low-resolution representations with spatial structures; ii) the following transformer layers are performed only to these clustered low-resolution tokens; and iii) reconstruction of high-resolution representations from refined low-resolution representations is accomplished using token reconstruction layer. The proposed approach shows promising results consistently on 6 dense prediction tasks, including object detection, semantic segmentation, panoptic segmentation, instance segmentation, depth estimation, and video instance segmentation. Additionally, we validate the effectiveness of the proposed approach on the very recent state-of-the-art open-vocabulary recognition methods. Furthermore, a number of recent representative approaches are benchmarked and compared on dense prediction tasks
|
650 |
|
4 |
|a Journal Article
|
700 |
1 |
|
|a Liang, Weicong
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Ding, Henghui
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Liang, Zhanhao
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Zhang, Chao
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Hu, Han
|e verfasserin
|4 aut
|
773 |
0 |
8 |
|i Enthalten in
|t IEEE transactions on pattern analysis and machine intelligence
|d 1979
|g 46(2023), 1 vom: 25. Jan., Seite 250-266
|w (DE-627)NLM098212257
|x 1939-3539
|7 nnns
|
773 |
1 |
8 |
|g volume:46
|g year:2023
|g number:1
|g day:25
|g month:01
|g pages:250-266
|
856 |
4 |
0 |
|u http://dx.doi.org/10.1109/TPAMI.2023.3327511
|3 Volltext
|
912 |
|
|
|a GBV_USEFLAG_A
|
912 |
|
|
|a SYSFLAG_A
|
912 |
|
|
|a GBV_NLM
|
912 |
|
|
|a GBV_ILN_350
|
951 |
|
|
|a AR
|
952 |
|
|
|d 46
|j 2023
|e 1
|b 25
|c 01
|h 250-266
|