Self-Distillation : Towards Efficient and Compact Neural Networks

Remarkable achievements have been obtained by deep neural networks in the last several years. However, the breakthrough in neural networks accuracy is always accompanied by explosive growth of computation and parameters, which leads to a severe limitation of model deployment. In this paper, we propo...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 44(2022), 8 vom: 23. Aug., Seite 4388-4403
1. Verfasser:	Zhang, Linfeng (VerfasserIn)
Weitere Verfasser:	Bao, Chenglong, Ma, Kaisheng
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2022
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article Research Support, Non-U.S. Gov't


LEADER	01000naa a22002652 4500
001	NLM32291602X
003	DE-627
005	20231225183044.0
007	cr uuu---uuuuu
008	231225s2022 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1109/TPAMI.2021.3067100 \|2 doi
028	5	2	\|a pubmed24n1076.xml
035			\|a (DE-627)NLM32291602X
035			\|a (NLM)33735074
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Zhang, Linfeng \|e verfasserin \|4 aut
245	1	0	\|a Self-Distillation \|b Towards Efficient and Compact Neural Networks
264		1	\|c 2022
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 07.07.2022
500			\|a Date Revised 09.07.2022
500			\|a published: Print-Electronic
500			\|a Citation Status MEDLINE
520			\|a Remarkable achievements have been obtained by deep neural networks in the last several years. However, the breakthrough in neural networks accuracy is always accompanied by explosive growth of computation and parameters, which leads to a severe limitation of model deployment. In this paper, we propose a novel knowledge distillation technique named self-distillation to address this problem. Self-distillation attaches several attention modules and shallow classifiers at different depths of neural networks and distills knowledge from the deepest classifier to the shallower classifiers. Different from the conventional knowledge distillation methods where the knowledge of the teacher model is transferred to another student model, self-distillation can be considered as knowledge transfer in the same model - from the deeper layers to the shallow layers. Moreover, the additional classifiers in self-distillation allow the neural network to work in a dynamic manner, which leads to a much higher acceleration. Experiments demonstrate that self-distillation has consistent and significant effectiveness on various neural networks and datasets. On average, 3.49 and 2.32 percent accuracy boost are observed on CIFAR100 and ImageNet. Besides, experiments show that self-distillation can be combined with other model compression methods, including knowledge distillation, pruning and lightweight model design
650		4	\|a Journal Article
650		4	\|a Research Support, Non-U.S. Gov't
700	1		\|a Bao, Chenglong \|e verfasserin \|4 aut
700	1		\|a Ma, Kaisheng \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on pattern analysis and machine intelligence \|d 1979 \|g 44(2022), 8 vom: 23. Aug., Seite 4388-4403 \|w (DE-627)NLM098212257 \|x 1939-3539 \|7 nnns
773	1	8	\|g volume:44 \|g year:2022 \|g number:8 \|g day:23 \|g month:08 \|g pages:4388-4403
856	4	0	\|u http://dx.doi.org/10.1109/TPAMI.2021.3067100 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 44 \|j 2022 \|e 8 \|b 23 \|c 08 \|h 4388-4403