GRIM : A General, Real-Time Deep Learning Inference Framework for Mobile Devices Based on Fine-Grained Structured Weight Sparsity

It is appealing but challenging to achieve real-time deep neural network (DNN) inference on mobile devices, because even the powerful modern mobile devices are considered as "resource-constrained" when executing large-scale DNNs. It necessitates the sparse model inference via weight prunin...

Description complète

Détails bibliographiques
Publié dans:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 44(2022), 10 vom: 16. Okt., Seite 6224-6239
Auteur principal:	Niu, Wei (Auteur)
Autres auteurs:	Li, Zhengang, Ma, Xiaolong, Dong, Peiyan, Zhou, Gang, Qian, Xuehai, Lin, Xue, Wang, Yanzhi, Ren, Bin
Format:	Article en ligne
Langue:	English
Publié:	2022
Accès à la collection:	IEEE transactions on pattern analysis and machine intelligence
Sujets:	Journal Article Research Support, U.S. Gov't, Non-P.H.S.


LEADER	01000caa a22002652c 4500
001	NLM326808833
003	DE-627
005	20250301221602.0
007	cr uuu---uuuuu
008	231225s2022 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TPAMI.2021.3089687 \|2 doi
028	5	2	\|a pubmed25n1089.xml
035			\|a (DE-627)NLM326808833
035			\|a (NLM)34133272
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Niu, Wei \|e verfasserin \|4 aut
245	1	0	\|a GRIM \|b A General, Real-Time Deep Learning Inference Framework for Mobile Devices Based on Fine-Grained Structured Weight Sparsity
264		1	\|c 2022
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 16.09.2022
500			\|a Date Revised 19.11.2022
500			\|a published: Print-Electronic
500			\|a Citation Status MEDLINE
520			\|a It is appealing but challenging to achieve real-time deep neural network (DNN) inference on mobile devices, because even the powerful modern mobile devices are considered as "resource-constrained" when executing large-scale DNNs. It necessitates the sparse model inference via weight pruning, i.e., DNN weight sparsity, and it is desirable to design a new DNN weight sparsity scheme that can facilitate real-time inference on mobile devices while preserving a high sparse model accuracy. This paper designs a novel mobile inference acceleration framework GRIM that is General to both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) and that achieves Real-time execution and high accuracy, leveraging fine-grained structured sparse model Inference and compiler optimizations for Mobiles. We start by proposing a new fine-grained structured sparsity scheme through the Block-based Column-Row (BCR) pruning. Based on this new fine-grained structured sparsity, our GRIM framework consists of two parts: (a) the compiler optimization and code generation for real-time mobile inference; and (b) the BCR pruning optimizations for determining pruning hyperparameters and performing weight pruning. We compare GRIM with Alibaba MNN, TVM, TensorFlow-Lite, a sparse implementation based on CSR, PatDNN, and ESE (a representative FPGA inference acceleration framework for RNNs), and achieve up to 14.08× speedup
650		4	\|a Journal Article
650		4	\|a Research Support, U.S. Gov't, Non-P.H.S.
700	1		\|a Li, Zhengang \|e verfasserin \|4 aut
700	1		\|a Ma, Xiaolong \|e verfasserin \|4 aut
700	1		\|a Dong, Peiyan \|e verfasserin \|4 aut
700	1		\|a Zhou, Gang \|e verfasserin \|4 aut
700	1		\|a Qian, Xuehai \|e verfasserin \|4 aut
700	1		\|a Lin, Xue \|e verfasserin \|4 aut
700	1		\|a Wang, Yanzhi \|e verfasserin \|4 aut
700	1		\|a Ren, Bin \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on pattern analysis and machine intelligence \|d 1979 \|g 44(2022), 10 vom: 16. Okt., Seite 6224-6239 \|w (DE-627)NLM098212257 \|x 1939-3539 \|7 nnas
773	1	8	\|g volume:44 \|g year:2022 \|g number:10 \|g day:16 \|g month:10 \|g pages:6224-6239
856	4	0	\|u http://dx.doi.org/10.1109/TPAMI.2021.3089687 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 44 \|j 2022 \|e 10 \|b 16 \|c 10 \|h 6224-6239