Visual Diagnostics of Parallel Performance in Training Large-Scale DNN Models

Diagnosing the cluster-based performance of large-scale deep neural network (DNN) models during training is essential for improving training efficiency and reducing resource consumption. However, it remains challenging due to the incomprehensibility of the parallelization strategy and the sheer volu...

Description complète

Détails bibliographiques
Publié dans:IEEE transactions on visualization and computer graphics. - 1996. - 30(2024), 7 vom: 08. Juli, Seite 3915-3929
Auteur principal: Wei, Yating (Auteur)
Autres auteurs: Wang, Zhiyong, Wang, Zhongwei, Dai, Yong, Ou, Gongchang, Gao, Han, Yang, Haitao, Wang, Yue, Cao, Caleb Chen, Weng, Luoxuan, Lu, Jiaying, Zhu, Rongchen, Chen, Wei
Format: Article en ligne
Langue:English
Publié: 2024
Accès à la collection:IEEE transactions on visualization and computer graphics
Sujets:Journal Article