Effective Training of Convolutional Neural Networks With Low-Bitwidth Weights and Activations
This paper tackles the problem of training a deep convolutional neural network of both low-bitwidth weights and activations. Optimizing a low-precision network is very challenging due to the non-differentiability of the quantizer, which may result in substantial accuracy loss. To address this, we pr...
Publié dans: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - 44(2022), 10 vom: 14. Okt., Seite 6140-6152 |
---|---|
Auteur principal: | |
Autres auteurs: | , , , , |
Format: | Article en ligne |
Langue: | English |
Publié: |
2022
|
Accès à la collection: | IEEE transactions on pattern analysis and machine intelligence |
Sujets: | Journal Article Research Support, Non-U.S. Gov't |
Résumé: | This paper tackles the problem of training a deep convolutional neural network of both low-bitwidth weights and activations. Optimizing a low-precision network is very challenging due to the non-differentiability of the quantizer, which may result in substantial accuracy loss. To address this, we propose three practical approaches, including (i) progressive quantization; (ii) stochastic precision; and (iii) joint knowledge distillation to improve the network training. First, for progressive quantization, we propose two schemes to progressively find good local minima. Specifically, we propose to first optimize a network with quantized weights and subsequently quantize activations. This is in contrast to the traditional methods which optimize them simultaneously. Furthermore, we propose a second progressive quantization scheme which gradually decreases the bitwidth from high-precision to low-precision during training. Second, to alleviate the excessive training burden due to the multi-round training stages, we further propose a one-stage stochastic precision strategy to randomly sample and quantize sub-networks while keeping other parts in full-precision. Finally, we adopt a novel learning scheme to jointly train a full-precision model alongside the low-precision one. By doing so, the full-precision model provides hints to guide the low-precision model training and significantly improves the performance of the low-precision network. Extensive experiments on various datasets (e.g., CIFAR-100, ImageNet) show the effectiveness of the proposed methods |
---|---|
Description: | Date Completed 16.09.2022 Date Revised 19.11.2022 published: Print-Electronic Citation Status MEDLINE |
ISSN: | 1939-3539 |
DOI: | 10.1109/TPAMI.2021.3088904 |