Visual Diagnostics of Parallel Performance in Training Large-Scale DNN Models

Diagnosing the cluster-based performance of large-scale deep neural network (DNN) models during training is essential for improving training efficiency and reducing resource consumption. However, it remains challenging due to the incomprehensibility of the parallelization strategy and the sheer volu...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on visualization and computer graphics. - 1996. - 30(2024), 7 vom: 10. Juni, Seite 3915-3929
1. Verfasser: Wei, Yating (VerfasserIn)
Weitere Verfasser: Wang, Zhiyong, Wang, Zhongwei, Dai, Yong, Ou, Gongchang, Gao, Han, Yang, Haitao, Wang, Yue, Cao, Caleb Chen, Weng, Luoxuan, Lu, Jiaying, Zhu, Rongchen, Chen, Wei
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2024
Zugriff auf das übergeordnete Werk:IEEE transactions on visualization and computer graphics
Schlagworte:Journal Article