Towards a Deeper Understanding of Global Covariance Pooling in Deep Learning : An Optimization Perspective

Global covariance pooling (GCP) as an effective alternative to global average pooling has shown good capacity to improve deep convolutional neural networks (CNNs) in a variety of vision tasks. Although promising performance, it is still an open problem on how GCP (especially its post-normalization)...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 45(2023), 12 vom: 02. Dez., Seite 15802-15819
1. Verfasser:	Wang, Qilong (VerfasserIn)
Weitere Verfasser:	Zhang, Zhaolin, Gao, Mingze, Xie, Jiangtao, Zhu, Pengfei, Li, Peihua, Zuo, Wangmeng, Hu, Qinghua
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2023
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article

Beschreibung
Zusammenfassung:	Global covariance pooling (GCP) as an effective alternative to global average pooling has shown good capacity to improve deep convolutional neural networks (CNNs) in a variety of vision tasks. Although promising performance, it is still an open problem on how GCP (especially its post-normalization) works in deep learning. In this paper, we make the effort towards understanding the effect of GCP on deep learning from an optimization perspective. Specifically, we first analyze behavior of GCP with matrix power normalization on optimization loss and gradient computation of deep architectures. Our findings show that GCP can improve Lipschitzness of optimization loss and achieve flatter local minima, while improving gradient predictiveness and functioning as a special pre-conditioner on gradients. Then, we explore the effect of post-normalization on GCP from the model optimization perspective, which encourages us to propose a simple yet effective normalization, namely DropCov. Based on above findings, we point out several merits of deep GCP that have not been recognized previously or fully explored, including faster convergence, stronger model robustness and better generalization across tasks. Extensive experimental results using both CNNs and vision transformers on diversified vision tasks provide strong support to our findings while verifying the effectiveness of our method
Beschreibung:	Date Revised 07.11.2023 published: Print-Electronic Citation Status PubMed-not-MEDLINE
ISSN:	1939-3539
DOI:	10.1109/TPAMI.2023.3321392