Scalable and Practical Natural Gradient for Large-Scale Deep Learning

Large-scale distributed training of deep neural networks results in models with worse generalization performance as a result of the increase in the effective mini-batch size. Previous approaches attempt to address this problem by varying the learning rate and batch size over epochs and layers, or ad...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence. - 1979. - 44(2022), 1 vom: 30. Jan., Seite 404-415
1. Verfasser: Osawa, Kazuki (VerfasserIn)
Weitere Verfasser: Tsuji, Yohei, Ueno, Yuichiro, Naruse, Akira, Foo, Chuan-Sheng, Yokota, Rio
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2022
Zugriff auf das übergeordnete Werk:IEEE transactions on pattern analysis and machine intelligence
Schlagworte:Journal Article Research Support, Non-U.S. Gov't