Effective Training of Convolutional Neural Networks With Low-Bitwidth Weights and Activations

This paper tackles the problem of training a deep convolutional neural network of both low-bitwidth weights and activations. Optimizing a low-precision network is very challenging due to the non-differentiability of the quantizer, which may result in substantial accuracy loss. To address this, we pr...

Description complète

Détails bibliographiques
Publié dans:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 44(2022), 10 vom: 14. Okt., Seite 6140-6152
Auteur principal: Zhuang, Bohan (Auteur)
Autres auteurs: Tan, Mingkui, Liu, Jing, Liu, Lingqiao, Reid, Ian, Shen, Chunhua
Format: Article en ligne
Langue:English
Publié: 2022
Accès à la collection:IEEE transactions on pattern analysis and machine intelligence
Sujets:Journal Article Research Support, Non-U.S. Gov't
Description
Résumé:This paper tackles the problem of training a deep convolutional neural network of both low-bitwidth weights and activations. Optimizing a low-precision network is very challenging due to the non-differentiability of the quantizer, which may result in substantial accuracy loss. To address this, we propose three practical approaches, including (i) progressive quantization; (ii) stochastic precision; and (iii) joint knowledge distillation to improve the network training. First, for progressive quantization, we propose two schemes to progressively find good local minima. Specifically, we propose to first optimize a network with quantized weights and subsequently quantize activations. This is in contrast to the traditional methods which optimize them simultaneously. Furthermore, we propose a second progressive quantization scheme which gradually decreases the bitwidth from high-precision to low-precision during training. Second, to alleviate the excessive training burden due to the multi-round training stages, we further propose a one-stage stochastic precision strategy to randomly sample and quantize sub-networks while keeping other parts in full-precision. Finally, we adopt a novel learning scheme to jointly train a full-precision model alongside the low-precision one. By doing so, the full-precision model provides hints to guide the low-precision model training and significantly improves the performance of the low-precision network. Extensive experiments on various datasets (e.g., CIFAR-100, ImageNet) show the effectiveness of the proposed methods
Description:Date Completed 16.09.2022
Date Revised 19.11.2022
published: Print-Electronic
Citation Status MEDLINE
ISSN:1939-3539
DOI:10.1109/TPAMI.2021.3088904