Interpreting Image Classifiers by Generating Discrete Masks

Deep models are commonly treated as black-boxes and lack interpretability. Here, we propose a novel approach to interpret deep image classifiers by generating discrete masks. Our method follows the generative adversarial network formalism. The deep model to be interpreted is the discriminator while...

Description complète

Détails bibliographiques
Publié dans:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 44(2022), 4 vom: 06. Apr., Seite 2019-2030
Auteur principal:	Yuan, Hao (Auteur)
Autres auteurs:	Cai, Lei, Hu, Xia, Wang, Jie, Ji, Shuiwang
Format:	Article en ligne
Langue:	English
Publié:	2022
Accès à la collection:	IEEE transactions on pattern analysis and machine intelligence
Sujets:	Journal Article

Description
Résumé:	Deep models are commonly treated as black-boxes and lack interpretability. Here, we propose a novel approach to interpret deep image classifiers by generating discrete masks. Our method follows the generative adversarial network formalism. The deep model to be interpreted is the discriminator while we train a generator to explain it. The generator is trained to capture discriminative image regions that should convey the same or similar meaning as the original image from the model's perspective. It produces a probability map from which a discrete mask can be sampled. Then the discriminator is used to measure the quality of the sampled mask and provide feedbacks for updating. Due to the sampling operations, the generator cannot be trained directly by back-propagation. We propose to update it using policy gradient. Furthermore, we propose to incorporate gradients as auxiliary information to reduce the search space and facilitate training. We conduct both quantitative and qualitative experiments on the ILSVRC dataset. Experimental results indicate that our method can provide reasonable explanations for predictions and outperform existing approaches. In addition, our method can pass the model randomization test, indicating that it is reasoning the attribution of network predictions
Description:	Date Revised 07.03.2022 published: Print-Electronic Citation Status PubMed-not-MEDLINE
ISSN:	1939-3539
DOI:	10.1109/TPAMI.2020.3028783