Learning to Explore Distillability and Sparsability : A Joint Framework for Model Compression

Deep learning shows excellent performance usually at the expense of heavy computation. Recently, model compression has become a popular way of reducing the computation. Compression can be achieved using knowledge distillation or filter pruning. Knowledge distillation improves the accuracy of a light...

Description complète

Détails bibliographiques
Publié dans:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 45(2023), 3 vom: 22. März, Seite 3378-3395
Auteur principal:	Liu, Yufan (Auteur)
Autres auteurs:	Cao, Jiajiong, Li, Bing, Hu, Weiming, Maybank, Stephen
Format:	Article en ligne
Langue:	English
Publié:	2023
Accès à la collection:	IEEE transactions on pattern analysis and machine intelligence
Sujets:	Journal Article

Description
Résumé:	Deep learning shows excellent performance usually at the expense of heavy computation. Recently, model compression has become a popular way of reducing the computation. Compression can be achieved using knowledge distillation or filter pruning. Knowledge distillation improves the accuracy of a lightweight network, while filter pruning removes redundant architecture in a cumbersome network. They are two different ways of achieving model compression, but few methods simultaneously consider both of them. In this paper, we revisit model compression and define two attributes of a model: distillability and sparsability, which reflect how much useful knowledge can be distilled and how many pruned ratios can be obtained, respectively. Guided by our observations and considering both accuracy and model size, a dynamically distillability-and-sparsability learning framework (DDSL) is introduced for model compression. DDSL consists of teacher, student and dean. Knowledge is distilled from the teacher to guide the student. The dean controls the training process by dynamically adjusting the distillation supervision and the sparsity supervision in a meta-learning framework. An alternating direction method of multiplier (ADMM)-based knowledge distillation-with-pruning (KDP) joint optimization algorithm is proposed to train the model. Extensive experimental results show that DDSL outperforms 24 state-of-the-art methods, including both knowledge distillation and filter pruning methods
Description:	Date Completed 07.04.2023 Date Revised 07.04.2023 published: Print-Electronic Citation Status PubMed-not-MEDLINE
ISSN:	1939-3539
DOI:	10.1109/TPAMI.2022.3185317