Idomura, Yasuhiro; Watanabe, Tomohiko*; Todo, Yasushi*
Shimyureshon, 38(2), p.79 - 86, 2019/06
We promote the research and development of exascale fusion plasma simulations on Post-K towards estimation and prediction of core plasma performance, and exploration of improved operation scenarios on the next generation fusion experimental reactor ITER. In this paper, we review developed exascale simulation technologies and outcomes from validation studies on existing experimental devices, and discuss perspectives on exascale fusion plasma simulations on Post-K.
Asahi, Yuichi*; Grandgirard, V.*; Sarazin, Y.*; Donnel, P.*; Garbet, X.*; Idomura, Yasuhiro; Dif-Pradalier, G.*; Latu, G.*
Plasma Physics and Controlled Fusion, 61(6), p.065015_1 - 065015_15, 2019/05
The role of poloidal convective cells on transport processes is studied with the full-F gyrokinetic code GYSELA. For this purpose, we apply a numerical filter to convective cells and compare the simulation results with and without the filter. The energy flux driven by the magnetic drifts turns out to be reduced by a factor of about 2 once the numerical filter is applied. A careful analysis reveals that the frequency spectrum of the convective cells is well-correlated with that of the turbulent Reynolds stress tensor, giving credit to their turbulence-driven origin. The impact of convective cells can be interpreted as a synergy between turbulence and neoclassical dynamics.
Maeyama, Shinya*; Watanabe, Tomohiko*; Idomura, Yasuhiro; Nakata, Motoki*; Nunami, Masanori*
Computer Physics Communications, 235, p.9 - 15, 2019/02
We have implemented the Sugama collision operator in the gyrokinetic Vlasov simulation code, GKV, with an implicit time-integration scheme. The new method is versatile and independent of the details of the linearized collision operator, by means of an operator splitting, an implicit time integrator, and an iterative Krylov subspace solver. Numerical tests demonstrate stable computation over the time step size restricted by the collision term. An efficient implementation for parallel computation on distributed memory systems is realized by using the data transpose communication, which makes the iterative solver free from inter-node communications during iteration. Consequently, the present approach achieves enhancement of computational efficiency and reduction of computational time to solution simultaneously, and significantly accelerates the total performance of the application.
Onodera, Naoyuki; Idomura, Yasuhiro; Yussuf, A.*; Shimokawabe, Takashi*
Proceedings of 9th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA 2018) (Internet), p.9 - 16, 2018/11
We develop a communication reduced multi-time- step (CRMT) algorithm for a Lattice Boltzmann method (LBM) based on a block-structured adaptive mesh refinement (AMR). This algorithm is based on the temporal blocking method, and can improve computational efficiency by replacing a communication bottleneck with additional computation. The proposed method is implemented on an extreme scale airflow simulation code CityLBM, and its impact on the scalability is tested on GPU based supercomputers, TSUBAME and Reedbush. Thanks to the CRMT algorithm, the communication cost is reduced by , and weak and strong scalings are improved up to GPUs. The obtained performance indicates that real time airflow simulations for about 2km square area with the wind speed of is feasible using 1m resolution.
Idomura, Yasuhiro; Ina, Takuya*; Yamashita, Susumu; Onodera, Naoyuki; Yamada, Susumu; Imamura, Toshiyuki*
Proceedings of 9th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA 2018) (Internet), p.17 - 24, 2018/11
A communication avoiding (CA) multigrid preconditioned conjugate gradient method (CAMGCG) is applied to the pressure Poisson equation in a multiphase CFD code JUPITER, and its computational performance and convergence property are compared against CA Krylov methods. In the JUPITER code, the CAMGCG solver has robust convergence properties regardless of the problem size, and shows both communication reduction and convergence improvement, leading to higher performance gain than CA Krylov solvers, which achieve only the former. The CAMGCG solver is applied to extreme scale multiphase CFD simulations with billion DOFs, and it is shown that compared with a preconditioned CG solver, the number of iterations is reduced to , and speedup is achieved with keeping excellent strong scaling up to 8,000 nodes on the Oakforest-PACS.
Onodera, Naoyuki; Idomura, Yasuhiro
Proceedings of 26th International Conference on Nuclear Engineering (ICONE-26) (Internet), 7 Pages, 2018/07
A large-scale simulation of the environmental dynamics of radioactive substances is very important from the viewpoint of nuclear security. Recently, GPU has been emerging as one of high performance devices to realize a large-scale simulation with less power consumption. We design a plume dispersion simulation based on the AMR-based LBM. We measure the performance of the LBM code on the GPU-rich supercomputer TSUBAME 3.0 at Tokyo Tech. We achieved good weak scaling from 4 GPUs to 144 GPUs, and 30 times higher node performance with CPUs. The code is validated against a wind tunnel test which was released from the National Institute of Advanced Industrial Science and Technology (AIST). The computational grids are subdivided by the AMR method, and the total number of grid points is reduced to less than 10% compared to the finest meshes. In spite of the fewer grid points, the turbulent statistics and plume dispersion are in good agreement with the experiment data.
Matsuoka, Seikichi; Idomura, Yasuhiro; Satake, Shinsuke*
Physics of Plasmas, 25(2), p.022510_1 - 022510_10, 2018/02
Global full-f gyrokinetic simulations, in which the gyrokinetic equation is solved based on the first principle without the scale separation with respect to the plasma distribution function, is attracting much attention in the plasma transport simulation studies. In this work, in order to apply a global full-f gyrokinetic simulation code GT5D to stellarator plasmas with complicated three-dimensional magnetic field configurations, we extend finite difference scheme of GT5D and develop a new interface code which incorporates the three-dimensional magnetic equilibria provided by a standard equilibrium code, VMEC. A series of benchmark calculations are carried out for the numerical verification of GT5D. It is successfully demonstrated that GT5D well reproduces results of a theoretical analysis and another global neoclassical transport code.
Idomura, Yasuhiro; Ina, Takuya*; Mayumi, Akie; Yamada, Susumu; Imamura, Toshiyuki*
Lecture Notes in Computer Science 10776, p.257 - 273, 2018/00
A preconditioned Chebyshev basis communication-avoiding conjugate gradient method (P-CBCG) is applied to the pressure Poisson equation in a multiphase thermal-hydraulic CFD code JUPITER, and its computational performance and convergence properties are compared against a preconditioned conjugate gradient (P-CG) method and a preconditioned communication-avoiding conjugate gradient (P-CACG) method on the Oakforest-PACS, which consists of 8,208 KNLs. The P-CBCG method reduces the number of collective communications with keeping the robustness of convergence properties. Compared with the P-CACG method, an order of magnitude larger communication-avoiding steps are enabled by the improved robustness. It is shown that the P-CBCG method is and faster than the P-CG and P-CACG methods at 2,000 processors, respectively.
Onodera, Naoyuki; Idomura, Yasuhiro
Lecture Notes in Computer Science 10776, p.128 - 145, 2018/00
We developed a CFD code based on the adaptive mesh-refined Lattice Boltzmann Method (AMR-LBM). The code is developed on the GPU-rich supercomputer TSUBAME3.0 at the Tokyo Tech, and the GPU kernel functions are tuned to achieve high performance on the Pascal GPU architecture. The performances of weak scaling from 1 nodes to 36 nodes are examined. The GPUs (NVIDIA TESLA P100) achieved more than 10 times higher node performance than that of CPUs (Broadwell).
Yamashita, Susumu; Ina, Takuya*; Idomura, Yasuhiro; Yoshida, Hiroyuki
Dai-31-Kai Suchi Ryutai Rikigaku Shimpojiumu Koen Rombunshu (DVD-ROM), 7 Pages, 2017/12
no abstracts in English
Idomura, Yasuhiro; Ina, Takuya*; Mayumi, Akie; Yamada, Susumu; Matsumoto, Kazuya*; Asahi, Yuichi*; Imamura, Toshiyuki*
Proceedings of 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA 2017), p.7_1 - 7_8, 2017/11
A communication-avoiding generalized minimal residual (CA-GMRES) method is applied to the gyrokinetic toroidal five dimensional Eulerian code GT5D, and its performance is compared against the original code with a generalized conjugate residual (GCR) method on the JAEA ICEX (Haswell), the Plasma Simulator (FX100), and the Oakforest-PACS (KNL). The CA-GMRES method has higher arithmetic intensity than the GCR method, and thus, is suitable for future Exa-scale architectures with limited memory and network bandwidths. In the performance evaluation, it is shown that compared with the GCR solver, its computing kernels are accelerated by , and the cost of data reduction communication is reduced from to of the total cost at 1,280 nodes.
Yamashita, Susumu; Ina, Takuya; Idomura, Yasuhiro; Yoshida, Hiroyuki
Nuclear Engineering and Design, 322, p.301 - 312, 2017/10
In recent years, significant attention has been paid to the precise determination of relocation of molten materials in reactor pressure vessels of boiling water reactors (BWRs) during severe accidents. To address this problem, we have developed a computational fluid dynamics code JUPITER, based on thermal-hydraulic equations and multi-phase simulation models. Although the Poisson solver has previously been a performance bottleneck in the JUPITER code, this is resolved by a new hybrid parallel Poisson solver, whose strong scaling is extended up to 200k cores on the K-computer. As a result of the improved computational capability, the problem size and physical models are dramatically expanded. A series of verification and validation studies are enabled, which are in agreement with previous numerical simulations and experiments. These physical and computational capabilities of JUPITER enable us to investigate molten material behaviors in reactor relevant situations.
Asahi, Yuichi*; Grandgirard, V.*; Idomura, Yasuhiro; Garbet, X.*; Latu, G.*; Sarazin, Y.*; Dif-Pradalier, G.*; Donnel, P.*; Ehrlacher, C.*
Physics of Plasmas, 24(10), p.102515_1 - 102515_17, 2017/10
Two full-F global gyrokinetic codes are benchmarked to compute flux-driven ion temperature gradient turbulence in tokamak plasmas. For this purpose, the Semi-Lagrangian code GYSELA and the Eulerian code GT5D are employed, which solve the full-F gyrokinetic equation with a realistic fixed flux condition. Using the appropriate settings for the boundary and initial conditions, flux-driven ITG turbulence simulations are carried out. The avalanche-like transport is assessed with a focus on spatio-temporal properties. A statistical analysis is performed to discuss this self-organized criticality (SOC) like behaviors, where we found spectra and a transition to spectra at high-frequency side in both codes. Based on these benchmarks, it is verified that the SOC-like behavior is robust and not dependent on numerics.
Matsuoka, Seikichi; Idomura, Yasuhiro; Satake, Shinsuke*
Physics of Plasmas, 24(10), p.102522_1 - 102522_9, 2017/10
In axisymmetric tokamak plasmas, effects of three-dimensional non-axisymmetric magnetic field perturbations caused by error fields etc. have attracted much attention from the view point of the control of the plasma performance and instabilities. Recent studies pointed out that there exists qualitative discrepancy in predicting the collisional viscosity driven by the perturbation between a theoretical bounce-averaged model and a global kinetic simulation. Clarifying the cause of the discrepancy by understanding the underlying mechanism is a key issue to establish a reliable basis for the NTV predictions. In this work, we perform two different kinds of global kinetic simulations for the NTV. As a result, it is first demonstrated that the discrepancy arises owing to the following two mechanisms related to the global particle orbit; (1) the effective magnitude of the perturbation becomes weak due to the loss of the resonant orbit, and (2) the phase mixing along the orbit arises and generates fine scale structures, resulting the damping of the NTV.
Mantica, P.*; Bourdelle, C.*; Camenen, Y.*; Dejarnac, R.*; Evans, T. E.*; Grler, T.*; Hillecheim, J.*; Idomura, Yasuhiro; Jakubowski, M.*; Ricci, P.*; et al.
Nuclear Fusion, 57(8), p.087001_1 - 087001_19, 2017/08
This conference report summarizes the contributions to, and discussions at, the 21st Joint EU-US Transport Task Force Workshop, held in Leysin, Switzerland, during 5-8 September 2016. The workshop was organized under 8 topics: progress towards full-F kinetic turbulence simulation; high and low Z impurity transport, control and effects on plasma confinement; 3D effects on core and edge transport (including MHD, external fields and stellarators); predictive experimental design; electron heat transport and multi-scale integration; understanding power decay length in the Scrape-Off Layer (SOL); role of the SOL in the L-H transition; validation of fundamental turbulence properties against turbulence measurements.
Physics of Plasmas, 24(8), p.080701_1 - 080701_5, 2017/08
An electron heating modulation numerical experiment based on a global full-f gyrokinetic model shows that transitions from ion temperature gradient driven (ITG) turbulence to trapped electron mode (TEM) turbulence induced by electron heating generate density peaking and rotation changes. Toroidal angular momentum balance during the rotation changes is revealed by direct observation of toroidal angular momentum conservation, in which in addition to ion turbulent stress, ion neoclassical stress, radial currents, and toroidal field stress of ions and electrons are important. Toroidal torque flipping between ITG and TEM phases is found to be related to reversal of the ion radial current that indicates coupling of particle and momentum transport channels. The ion and electron radial currents are balanced to satisfy the ambipolar condition, and the electron radial current is cancelled by the electron toroidal field stress, which indirectly affects toroidal torque.
Yamada, Susumu; Ina, Takuya*; Sasa, Narimasa; Idomura, Yasuhiro; Machida, Masahiko; Imamura, Toshiyuki*
Proceedings of 2017 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW) (Internet), p.1418 - 1425, 2017/08
no abstracts in English
Asahi, Yuichi*; Latu, G.*; Ina, Takuya; Idomura, Yasuhiro; Grandgirard, V.*; Garbet, X.*
IEEE Transactions on Parallel and Distributed Systems, 28(7), p.1974 - 1988, 2017/07
High-dimensional stencil computation from fusion plasma turbulence codes involving complex memory access patterns, the indirect memory access in a Semi-Lagrangian scheme and the strided memory access in a Finite-Difference scheme, are optimized on accelerators such as GPGPUs and Xeon Phi coprocessors. On both devices, the Array of Structure of Array (AoSoA) data layout is preferable for contiguous memory accesses. It is shown that the effective local cache usage by improving spatial and temporal data locality is critical on Xeon Phi. On GPGPU, the texture memory usage improves the performance of the indirect memory accesses in the Semi-Lagrangian scheme. Thanks to these optimizations, the fusion kernels on accelerators become 1.4x - 8.1x faster than those on Sandy Bridge (CPU).
Kawamura, Takuma; Noda, Tomoyuki; Idomura, Yasuhiro
Supercomputing Frontiers and Innovations, 4(3), p.43 - 54, 2017/07
We examine the performance of the in-situ data exploration framework based on the in-situ Particle Based Volume Rendering (In-Situ PBVR) on the latest many-core platform. In-Situ PBVR converts extreme scale volume data into small rendering primitive particle data via parallel Monte-Carlo sampling without costly visibility ordering. This feature avoids severe bottlenecks such as limited memory size per node and significant performance gap between computation and inter-node communication. In addition, remote in-situ data exploration is enabled by asynchronous file-based control sequences, which transfer the small particle data to client PCs, generate view-independent volume rendering images on client PCs, and change visualization parameters at runtime. In-Situ PBVR shows excellent strong scaling with low memory usage up to about 100k cores on the Oakforest-PACS, which consists of 8,208 Intel Xeon Phi7250 (Knights Landing) processors.
Maeyama, Shinya*; Watanabe, Tomohiko*; Idomura, Yasuhiro; Nakata, Motoki*; Ishizawa, Akihiro*; Nunami, Masanori*
Nuclear Fusion, 57(6), p.066036_1 - 066036_10, 2017/05
Multi-scale plasma turbulence including electron and ion temperature gradient (ETG/ITG) modes has been investigated by means of electromagnetic gyrokinetic simulations. Triad transfer analyses on nonlinear mode coupling reveal cross-scale interactions between electron and ion scales. One of the interactions is suppression of electron-scale turbulence by ion- scale turbulence, where ITG-driven short-wavelength eddies act like shear flows and suppress ETG turbulence. Another cross-scale interaction is enhancement of ion-scale turbulence in the presence of electron-scale turbulence. This is caused via short-wavelength zonal flows, which are created by the response of passing kinetic electrons in ITG turbulence, suppress ITG turbulence by their shearing, and are damped by ETG turbulence. In both cases, sub-ion-scale structures between electron and ion scales play important roles in the cross-scale interactions.