|
|
|
|
LEADER |
01000naa a22002652 4500 |
001 |
NLM326808833 |
003 |
DE-627 |
005 |
20231225195410.0 |
007 |
cr uuu---uuuuu |
008 |
231225s2022 xx |||||o 00| ||eng c |
024 |
7 |
|
|a 10.1109/TPAMI.2021.3089687
|2 doi
|
028 |
5 |
2 |
|a pubmed24n1089.xml
|
035 |
|
|
|a (DE-627)NLM326808833
|
035 |
|
|
|a (NLM)34133272
|
040 |
|
|
|a DE-627
|b ger
|c DE-627
|e rakwb
|
041 |
|
|
|a eng
|
100 |
1 |
|
|a Niu, Wei
|e verfasserin
|4 aut
|
245 |
1 |
0 |
|a GRIM
|b A General, Real-Time Deep Learning Inference Framework for Mobile Devices Based on Fine-Grained Structured Weight Sparsity
|
264 |
|
1 |
|c 2022
|
336 |
|
|
|a Text
|b txt
|2 rdacontent
|
337 |
|
|
|a ƒaComputermedien
|b c
|2 rdamedia
|
338 |
|
|
|a ƒa Online-Ressource
|b cr
|2 rdacarrier
|
500 |
|
|
|a Date Completed 16.09.2022
|
500 |
|
|
|a Date Revised 19.11.2022
|
500 |
|
|
|a published: Print-Electronic
|
500 |
|
|
|a Citation Status MEDLINE
|
520 |
|
|
|a It is appealing but challenging to achieve real-time deep neural network (DNN) inference on mobile devices, because even the powerful modern mobile devices are considered as "resource-constrained" when executing large-scale DNNs. It necessitates the sparse model inference via weight pruning, i.e., DNN weight sparsity, and it is desirable to design a new DNN weight sparsity scheme that can facilitate real-time inference on mobile devices while preserving a high sparse model accuracy. This paper designs a novel mobile inference acceleration framework GRIM that is General to both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) and that achieves Real-time execution and high accuracy, leveraging fine-grained structured sparse model Inference and compiler optimizations for Mobiles. We start by proposing a new fine-grained structured sparsity scheme through the Block-based Column-Row (BCR) pruning. Based on this new fine-grained structured sparsity, our GRIM framework consists of two parts: (a) the compiler optimization and code generation for real-time mobile inference; and (b) the BCR pruning optimizations for determining pruning hyperparameters and performing weight pruning. We compare GRIM with Alibaba MNN, TVM, TensorFlow-Lite, a sparse implementation based on CSR, PatDNN, and ESE (a representative FPGA inference acceleration framework for RNNs), and achieve up to 14.08× speedup
|
650 |
|
4 |
|a Journal Article
|
650 |
|
4 |
|a Research Support, U.S. Gov't, Non-P.H.S.
|
700 |
1 |
|
|a Li, Zhengang
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Ma, Xiaolong
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Dong, Peiyan
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Zhou, Gang
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Qian, Xuehai
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Lin, Xue
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Wang, Yanzhi
|e verfasserin
|4 aut
|
700 |
1 |
|
|a Ren, Bin
|e verfasserin
|4 aut
|
773 |
0 |
8 |
|i Enthalten in
|t IEEE transactions on pattern analysis and machine intelligence
|d 1979
|g 44(2022), 10 vom: 16. Okt., Seite 6224-6239
|w (DE-627)NLM098212257
|x 1939-3539
|7 nnns
|
773 |
1 |
8 |
|g volume:44
|g year:2022
|g number:10
|g day:16
|g month:10
|g pages:6224-6239
|
856 |
4 |
0 |
|u http://dx.doi.org/10.1109/TPAMI.2021.3089687
|3 Volltext
|
912 |
|
|
|a GBV_USEFLAG_A
|
912 |
|
|
|a SYSFLAG_A
|
912 |
|
|
|a GBV_NLM
|
912 |
|
|
|a GBV_ILN_350
|
951 |
|
|
|a AR
|
952 |
|
|
|d 44
|j 2022
|e 10
|b 16
|c 10
|h 6224-6239
|