Initialising ...
Initialising ...
Initialising ...
Initialising ...
Initialising ...
Initialising ...
Initialising ...
Matsumoto, Kazuya*; Idomura, Yasuhiro; Ina, Takuya*; Mayumi, Akie; Yamada, Susumu
Journal of Supercomputing, 75(12), p.8115 - 8146, 2019/12
Times Cited Count:2 Percentile:24.73(Computer Science, Hardware & Architecture)A communication-avoiding generalized minimum residual method (CA-GMRES) is implemented on a hybrid CPU-GPU cluster, targeted for the performance acceleration of iterative linear system solver in the gyrokinetic toroidal five-dimensional Eulerian code GT5D. In addition to the CA-GMRES, we implement and evaluate a modified variant of CA-GMRES (M-CA-GMRES) proposed in our previous study to reduce the amount of floating-point calculations. This study demonstrates that beneficial features of the CA-GMRES are in its minimum number of collective communications and its highly efficient calculations based on dense matrix-matrix operations. The performance evaluation is conducted on the Reedbush-L GPU cluster, which contains four NVIDIA Tesla P100 GPUs per compute node. The evaluation results show that the M-CA-GMRES is 1.09x, 1.22x and 1.50x faster than the CA-GMRES, the generalized conjugate residual method (GCR), and the GMRES, respectively, when 64 GPUs are used.
Nakata, Kotaro*; Hasegawa, Takuma*; Solomon, D. K.*; Miyakawa, Kazuya; Tomioka, Yuichi*; Ota, Tomoko*; Matsumoto, Takuya*; Hama, Katsuhiro; Iwatsuki, Teruki; Ono, Masahiko*; et al.
Applied Geochemistry, 104, p.60 - 70, 2019/05
Times Cited Count:9 Percentile:38.79(Geochemistry & Geophysics)no abstracts in English
Idomura, Yasuhiro; Ina, Takuya*; Mayumi, Akie; Yamada, Susumu; Matsumoto, Kazuya*; Asahi, Yuichi*; Imamura, Toshiyuki*
Proceedings of 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA 2017), p.7_1 - 7_8, 2017/11
A communication-avoiding generalized minimal residual (CA-GMRES) method is applied to the gyrokinetic toroidal five dimensional Eulerian code GT5D, and its performance is compared against the original code with a generalized conjugate residual (GCR) method on the JAEA ICEX (Haswell), the Plasma Simulator (FX100), and the Oakforest-PACS (KNL). The CA-GMRES method has higher arithmetic intensity than the GCR method, and thus, is suitable for future Exa-scale architectures with limited memory and network bandwidths. In the performance evaluation, it is shown that compared with the GCR solver, its computing kernels are accelerated by , and the cost of data reduction communication is reduced from to of the total cost at 1,280 nodes.
Matsumoto, Kazuya; Asahi, Yuichi*; Ina, Takuya; Idomura, Yasuhiro
no journal, ,
We present the implementation and performance evaluation results of the plasma physics simulation code called GT5D on a GPU cluster. In this study, an iterative matrix solver, which is identified as a performance bottleneck in the code, is tuned on the GPU. The measured performance is compared with attainable performance calculated by the roofline model. Additionally, we show the implementation with direction communications between GPUs for utilizing many GPUs.
Matsumoto, Kazuya*; Idomura, Yasuhiro; Ina, Takuya*; Mayumi, Akie; Yamada, Susumu
no journal, ,
Communication avoiding (CA) Krylov methods are promising solutions for communication bottlenecks on supercomputers based on many core processors or accelerators. In this work, we implemented the CA-GMRES method on a GPU cluster, the HA-PACS, and evaluated its performance on a non-symmetric matrix solver from a nuclear CFD code. The result shows that the CA-GMRES method is significantly faster than the conventional Krylov methods such as the GMRES method and the GCR method.
Idomura, Yasuhiro; Ina, Takuya*; Mayumi, Akie; Yamada, Susumu; Matsumoto, Kazuya*; Asahi, Yuichi*; Imamura, Toshiyuki*
no journal, ,
We propose a modified communication-avoiding generalized minimal residual (CA-GMRES) method, which reduces both computation and memory access by 30% with keeping the same CA property as the original CA-GMRES method. These numerical properties, less communication and computation with higher arithmetic intensity, are promising features for future exascale machines with limited memory and network bandwidths. The modified CA-GMRES method is applied to a large scale non-symmetric matrix in an implicit solver of the gyrokinetic toroidal five dimensional Eulerian code GT5D, and its performance is estimated on the Oakforest-PACS (KNL). The numerical experiment shows that compared with the generalized conjugate residual method, computing kernels are accelerated by 1.5x, and the cost of data reduction communication is reduced from 12.5% to 1% of the total cost at 1,280 nodes.
Ozaki, Hirokazu*; Yoshimura, Kazuya; Katayose, Yuji*; Matsumoto, Takumi*; Asaoka, Yoshihiro*; Hayashi, Seiji*
no journal, ,
no abstracts in English
Ozaki, Hirokazu*; Hayashi, Seiji*; Yoshimura, Kazuya; Katayose, Yuji*; Matsumoto, Takumi*; Asaoka, Yoshihiro*
no journal, ,
no abstracts in English