Massively parallel algorithm and implementation of RI-MP2 energy calculation for peta-scale many-core supercomputers

Bibliographische Detailangaben
Veröffentlicht in:	Journal of computational chemistry. - 1984. - 37(2016), 30 vom: 15. Nov., Seite 2623-2633
1. Verfasser:	Katouda, Michio (VerfasserIn)
Weitere Verfasser:	Naruse, Akira, Hirano, Yukihiko, Nakajima, Takahito
Format:	Online-Aufsatz
Sprache:	English
Veröffentlicht:	2016
Zugriff auf das übergeordnete Werk:	Journal of computational chemistry
Schlagworte:	Journal Article Research Support, Non-U.S. Gov't GPGPU K computer NTChem TSUBAME 2.5 electron correlation theory massively parallel algorithm second-order Møller-Plesset perturbation theory


LEADER	01000naa a22002652 4500
001	NLM264395492
003	DE-627
005	20231224210006.0
007	cr uuu---uuuuu
008	231224s2016 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1002/jcc.24491 \|2 doi
028	5	2	\|a pubmed24n0881.xml
035			\|a (DE-627)NLM264395492
035			\|a (NLM)27634573
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Katouda, Michio \|e verfasserin \|4 aut
245	1	0	\|a Massively parallel algorithm and implementation of RI-MP2 energy calculation for peta-scale many-core supercomputers
264		1	\|c 2016
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 19.07.2018
500			\|a Date Revised 19.07.2018
500			\|a published: Print-Electronic
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a © 2016 Wiley Periodicals, Inc.
520			\|a A new parallel algorithm and its implementation for the RI-MP2 energy calculation utilizing peta-flop-class many-core supercomputers are presented. Some improvements from the previous algorithm (J. Chem. Theory Comput. 2013, 9, 5373) have been performed: (1) a dual-level hierarchical parallelization scheme that enables the use of more than 10,000 Message Passing Interface (MPI) processes and (2) a new data communication scheme that reduces network communication overhead. A multi-node and multi-GPU implementation of the present algorithm is presented for calculations on a central processing unit (CPU)/graphics processing unit (GPU) hybrid supercomputer. Benchmark results of the new algorithm and its implementation using the K computer (CPU clustering system) and TSUBAME 2.5 (CPU/GPU hybrid system) demonstrate high efficiency. The peak performance of 3.1 PFLOPS is attained using 80,199 nodes of the K computer. The peak performance of the multi-node and multi-GPU implementation is 514 TFLOPS using 1349 nodes and 4047 GPUs of TSUBAME 2.5. © 2016 Wiley Periodicals, Inc
650		4	\|a Journal Article
650		4	\|a Research Support, Non-U.S. Gov't
650		4	\|a GPGPU
650		4	\|a K computer
650		4	\|a NTChem
650		4	\|a TSUBAME 2.5
650		4	\|a electron correlation theory
650		4	\|a massively parallel algorithm
650		4	\|a second-order Møller-Plesset perturbation theory
700	1		\|a Naruse, Akira \|e verfasserin \|4 aut
700	1		\|a Hirano, Yukihiko \|e verfasserin \|4 aut
700	1		\|a Nakajima, Takahito \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t Journal of computational chemistry \|d 1984 \|g 37(2016), 30 vom: 15. Nov., Seite 2623-2633 \|w (DE-627)NLM098138448 \|x 1096-987X \|7 nnns
773	1	8	\|g volume:37 \|g year:2016 \|g number:30 \|g day:15 \|g month:11 \|g pages:2623-2633
856	4	0	\|u http://dx.doi.org/10.1002/jcc.24491 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 37 \|j 2016 \|e 30 \|b 15 \|c 11 \|h 2623-2633