Massively parallel algorithm and implementation of RI-MP2 energy calculation for peta-scale many-core supercomputers

© 2016 Wiley Periodicals, Inc.

Bibliographische Detailangaben
Veröffentlicht in:Journal of computational chemistry. - 1984. - 37(2016), 30 vom: 15. Nov., Seite 2623-2633
1. Verfasser: Katouda, Michio (VerfasserIn)
Weitere Verfasser: Naruse, Akira, Hirano, Yukihiko, Nakajima, Takahito
Format: Online-Aufsatz
Sprache:English
Veröffentlicht: 2016
Zugriff auf das übergeordnete Werk:Journal of computational chemistry
Schlagworte:Journal Article Research Support, Non-U.S. Gov't GPGPU K computer NTChem TSUBAME 2.5 electron correlation theory massively parallel algorithm second-order Møller-Plesset perturbation theory
LEADER 01000naa a22002652 4500
001 NLM264395492
003 DE-627
005 20231224210006.0
007 cr uuu---uuuuu
008 231224s2016 xx |||||o 00| ||eng c
024 7 |a 10.1002/jcc.24491  |2 doi 
028 5 2 |a pubmed24n0881.xml 
035 |a (DE-627)NLM264395492 
035 |a (NLM)27634573 
040 |a DE-627  |b ger  |c DE-627  |e rakwb 
041 |a eng 
100 1 |a Katouda, Michio  |e verfasserin  |4 aut 
245 1 0 |a Massively parallel algorithm and implementation of RI-MP2 energy calculation for peta-scale many-core supercomputers 
264 1 |c 2016 
336 |a Text  |b txt  |2 rdacontent 
337 |a ƒaComputermedien  |b c  |2 rdamedia 
338 |a ƒa Online-Ressource  |b cr  |2 rdacarrier 
500 |a Date Completed 19.07.2018 
500 |a Date Revised 19.07.2018 
500 |a published: Print-Electronic 
500 |a Citation Status PubMed-not-MEDLINE 
520 |a © 2016 Wiley Periodicals, Inc. 
520 |a A new parallel algorithm and its implementation for the RI-MP2 energy calculation utilizing peta-flop-class many-core supercomputers are presented. Some improvements from the previous algorithm (J. Chem. Theory Comput. 2013, 9, 5373) have been performed: (1) a dual-level hierarchical parallelization scheme that enables the use of more than 10,000 Message Passing Interface (MPI) processes and (2) a new data communication scheme that reduces network communication overhead. A multi-node and multi-GPU implementation of the present algorithm is presented for calculations on a central processing unit (CPU)/graphics processing unit (GPU) hybrid supercomputer. Benchmark results of the new algorithm and its implementation using the K computer (CPU clustering system) and TSUBAME 2.5 (CPU/GPU hybrid system) demonstrate high efficiency. The peak performance of 3.1 PFLOPS is attained using 80,199 nodes of the K computer. The peak performance of the multi-node and multi-GPU implementation is 514 TFLOPS using 1349 nodes and 4047 GPUs of TSUBAME 2.5. © 2016 Wiley Periodicals, Inc 
650 4 |a Journal Article 
650 4 |a Research Support, Non-U.S. Gov't 
650 4 |a GPGPU 
650 4 |a K computer 
650 4 |a NTChem 
650 4 |a TSUBAME 2.5 
650 4 |a electron correlation theory 
650 4 |a massively parallel algorithm 
650 4 |a second-order Møller-Plesset perturbation theory 
700 1 |a Naruse, Akira  |e verfasserin  |4 aut 
700 1 |a Hirano, Yukihiko  |e verfasserin  |4 aut 
700 1 |a Nakajima, Takahito  |e verfasserin  |4 aut 
773 0 8 |i Enthalten in  |t Journal of computational chemistry  |d 1984  |g 37(2016), 30 vom: 15. Nov., Seite 2623-2633  |w (DE-627)NLM098138448  |x 1096-987X  |7 nnns 
773 1 8 |g volume:37  |g year:2016  |g number:30  |g day:15  |g month:11  |g pages:2623-2633 
856 4 0 |u http://dx.doi.org/10.1002/jcc.24491  |3 Volltext 
912 |a GBV_USEFLAG_A 
912 |a SYSFLAG_A 
912 |a GBV_NLM 
912 |a GBV_ILN_350 
951 |a AR 
952 |d 37  |j 2016  |e 30  |b 15  |c 11  |h 2623-2633