Unpacking the Gap Box Against Data-Free Knowledge Distillation

Data-free knowledge distillation (DFKD) improves the student model (S) by mimicking the class probability from a pre-trained teacher model (T) without training data. Under such setting, an ideal scenario is that T can help generate "good" samples from a generator (G) to maximally benefit S...

Description complète

Détails bibliographiques
Publié dans:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 46(2024), 9 vom: 20. Sept., Seite 6280-6291
Auteur principal: Wang, Yang (Auteur)
Autres auteurs: Qian, Biao, Liu, Haipeng, Rui, Yong, Wang, Meng
Format: Article en ligne
Langue:English
Publié: 2024
Accès à la collection:IEEE transactions on pattern analysis and machine intelligence
Sujets:Journal Article
LEADER 01000caa a22002652c 4500
001 NLM369969944
003 DE-627
005 20250305232047.0
007 cr uuu---uuuuu
008 240322s2024 xx |||||o 00| ||eng c
024 7 |a 10.1109/TPAMI.2024.3379505  |2 doi 
028 5 2 |a pubmed25n1232.xml 
035 |a (DE-627)NLM369969944 
035 |a (NLM)38507388 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Wang, Yang  |e verfasserin  |4 aut 
245 1 0 |a Unpacking the Gap Box Against Data-Free Knowledge Distillation 
264 1 |c 2024 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Revised 07.08.2024 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a Data-free knowledge distillation (DFKD) improves the student model (S) by mimicking the class probability from a pre-trained teacher model (T) without training data. Under such setting, an ideal scenario is that T can help generate "good" samples from a generator (G) to maximally benefit S. However, existing arts suffer from the non-ideal generated samples under the disturbance of the gap (i.e., either too large or small) between the class probabilities of T and S; for example, the generated samples with too large gap may exhibit excessive information for S, while too small gap leads to the limited knowledge in the samples, resulting into the poor generalization. Meanwhile, they fail to judge the "goodness" of the generated samples for S since the fixed T is not necessarily ideal. In this paper, we aim to answer what is inside the gap box; together with how to yield "good" generated samples for DFKD? To this end, we propose a Gap-Sensitive Sample Generation (GapSSG) approach, by revisiting the empirical distilled risk from a data-free perspective, which confirms the existence of an ideal teacher (T *), while theoretically implying: (1) the gap disturbance originates from the mismatch between T and T *, hence the class probabilities of T enable the approximation to those of T *; and (2) "good" samples should maximally benefit S via T's class probabilities, owing to unknown T *. To this end, we unpack the gap box between T and S as two findings: inherent gap to perceive T and T *; derived gap to monitor S and T *. Benefiting from the derived gap that focuses on the adaptability of generated sample to S, we attempt to track student's training route (a series of training epochs) to capture the category distribution of S; upon which, a regulatory factor is further devised to approximate T * over inherent gap, so as to generate "good" samples to S. Furthermore, during the distillation process, a sample-balanced strategy comes up to tackle the overfitting and missing knowledge issues between the generated partial and critical samples by training G. The theoretical and empirical studies verify the advantages of GapSSG over the state-of-the-arts 
650 4 |a Journal Article 
700 1 |a Qian, Biao  |e verfasserin  |4 aut 
700 1 |a Liu, Haipeng  |e verfasserin  |4 aut 
700 1 |a Rui, Yong  |e verfasserin  |4 aut 
700 1 |a Wang, Meng  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t IEEE transactions on pattern analysis and machine intelligence  |d 1979  |g 46(2024), 9 vom: 20. Sept., Seite 6280-6291  |w (DE-627)NLM098212257  |x 1939-3539  |7 nnas 
773 1 8 |g volume:46  |g year:2024  |g number:9  |g day:20  |g month:09  |g pages:6280-6291 
856 4 0 |u http://dx.doi.org/10.1109/TPAMI.2024.3379505  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 46  |j 2024  |e 9  |b 20  |c 09  |h 6280-6291