Scalable and Practical Natural Gradient for Large-Scale Deep Learning
Large-scale distributed training of deep neural networks results in models with worse generalization performance as a result of the increase in the effective mini-batch size. Previous approaches attempt to address this problem by varying the learning rate and batch size over epochs and layers, or ad...
Ausführliche Beschreibung
Bibliographische Detailangaben
Veröffentlicht in: | IEEE transactions on pattern analysis and machine intelligence. - 1979. - 44(2022), 1 vom: 30. Jan., Seite 404-415
|
1. Verfasser: |
Osawa, Kazuki
(VerfasserIn) |
Weitere Verfasser: |
Tsuji, Yohei,
Ueno, Yuichiro,
Naruse, Akira,
Foo, Chuan-Sheng,
Yokota, Rio |
Format: | Online-Aufsatz
|
Sprache: | English |
Veröffentlicht: |
2022
|
Zugriff auf das übergeordnete Werk: | IEEE transactions on pattern analysis and machine intelligence
|
Schlagworte: | Journal Article
Research Support, Non-U.S. Gov't |