Idomura, Yasuhiro; Obrejan, K.*; Asahi, Yuichi; Honda, Mitsuru*
Physics of Plasmas, 28(1), p.012501_1 - 012501_11, 2021/01
Tracer impurity transport in ion temperature gradient driven (ITG) turbulence is investigated using a global full- gyrokinetic simulation including kinetic electrons, bulk ions, and low to medium tracer impurities, where is the charge number. It is found that in addition to turbulent particle transport, enhanced neoclassical particle transport due to a new synergy effect between turbulent and neoclassical transports makes a significant contribution to tracer impurity transport. Bursty excitation of the ITG mode generates non-ambipolar turbulent particle fluxes of electrons and bulk ions, leading to a fast growth of the radial electric field following the ambipolar condition. The divergence of flows compresses up-down asymmetric density perturbations, which are subject to transport induced by the magnetic drift. The enhanced neoclassical particle transport depends on the ion mass, because the magnitude of up-down asymmetric density perturbation is determined by a competition between the compression effect and the return current given by the parallel streaming motion. This mechanism does not work for the temperature, and thus, selectively enhances only particle transport.
Asahi, Yuichi; Latu, G.*; Bigot, J.*; Grandgirard, V.*
Proceedings of Joint International Conference on Supercomputing in Nuclear Applications + Monte Carlo 2020 (SNA + MC 2020), p.218 - 224, 2020/10
Performance portability is expected to be a critical issue in the upcoming exascale era. We explore a performance portable approach for a fusion plasma turbulence simulation code employing the kinetic model, namely the GYSELA code. For this purpose, we extract the key features of GYSELA such as the high dimensionality (more than 4D) and the semi-Lagrangian scheme, and encapsulate them into a mini-application which solves the similar but a simplified Vlasov-Poisson system as GYSELA. We implement the mini-app with OpenACC, OpenMP4.5 and Kokkos, where we suppress unnecessary duplications of code lines. Based on our experience, we discuss the advantages and disadvantages of OpenACC, OpenMP4.5 and Kokkos, from the view point of performance portability, readability and productivity.
Asahi, Yuichi*; Latu, G.*; Bigot, J.*; Maeyama, Shinya*; Grandgirard, V.*; Idomura, Yasuhiro
Concurrency and Computation; Practice and Experience, 32(5), p.e5551_1 - e5551_21, 2020/03
Two five-dimensional gyrokinetic codes GYSELA and GKV were ported to the modern accelerators, Xeon Phi KNL and Tesla P100 GPU. Serial computing kernels of GYSELA on KNL and GKV on P100 GPU were respectively 1.3x and 7.4x faster than those on a single Skylake processor. Scaling tests of GYSELA and GKV were respectively performed from 16 to 512 KNLs and from 32 to 256 P100 GPUs, and data transpose communications in semi-Lagrangian kernels in GYSELA and in convolution kernels in GKV were found to be main bottlenecks, respectively. In order to mitigate the communication costs, pipeline-based and task-based communication overlapping were implemented in these codes.
Asahi, Yuichi*; Grandgirard, V.*; Sarazin, Y.*; Donnel, P.*; Garbet, X.*; Idomura, Yasuhiro; Dif-Pradalier, G.*; Latu, G.*
Plasma Physics and Controlled Fusion, 61(6), p.065015_1 - 065015_15, 2019/05
The role of poloidal convective cells on transport processes is studied with the full-F gyrokinetic code GYSELA. For this purpose, we apply a numerical filter to convective cells and compare the simulation results with and without the filter. The energy flux driven by the magnetic drifts turns out to be reduced by a factor of about 2 once the numerical filter is applied. A careful analysis reveals that the frequency spectrum of the convective cells is well-correlated with that of the turbulent Reynolds stress tensor, giving credit to their turbulence-driven origin. The impact of convective cells can be interpreted as a synergy between turbulence and neoclassical dynamics.
Donnel, P.*; Garbet, X.*; Sarazin, Y.*; Asahi, Yuichi; Wilczynski, F.*; Caschera, E.*; Dif-Pradalier, G.*; Ghendrih, P.*; Gillot, C.*
Plasma Physics and Controlled Fusion, 61(1), p.014003_1 - 014003_11, 2019/01
Poloidal asymmetries of the plasma flow are known to play a role in neoclassical transport. According to conventional neoclassical theory, the level of poloidal asymmetry of the electric potential is expected to be very small. In the present work, a general framework for the generation of axisymmetric structures of potential by turbulence is presented. Zonal flows, geodesic acoustic modes and convective cells are described by a single model. This is done by solving the gyrokinetic equation coupled to the quasi-neutrality equation. This calculation provides a predictive calculation of the frequency spectrum of flows given a specified forcing due to turbulence. It also shows that the dominant mechanism comes from zonal flow compression at intermediate frequencies, while ballooning of the turbulence Reynolds stress appears to be the main drive at low frequency.
Idomura, Yasuhiro; Ina, Takuya*; Mayumi, Akie; Yamada, Susumu; Matsumoto, Kazuya*; Asahi, Yuichi*; Imamura, Toshiyuki*
Proceedings of 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA 2017), p.7_1 - 7_8, 2017/11
A communication-avoiding generalized minimal residual (CA-GMRES) method is applied to the gyrokinetic toroidal five dimensional Eulerian code GT5D, and its performance is compared against the original code with a generalized conjugate residual (GCR) method on the JAEA ICEX (Haswell), the Plasma Simulator (FX100), and the Oakforest-PACS (KNL). The CA-GMRES method has higher arithmetic intensity than the GCR method, and thus, is suitable for future Exa-scale architectures with limited memory and network bandwidths. In the performance evaluation, it is shown that compared with the GCR solver, its computing kernels are accelerated by , and the cost of data reduction communication is reduced from to of the total cost at 1,280 nodes.
Asahi, Yuichi*; Grandgirard, V.*; Idomura, Yasuhiro; Garbet, X.*; Latu, G.*; Sarazin, Y.*; Dif-Pradalier, G.*; Donnel, P.*; Ehrlacher, C.*
Physics of Plasmas, 24(10), p.102515_1 - 102515_17, 2017/10
Two full-F global gyrokinetic codes are benchmarked to compute flux-driven ion temperature gradient turbulence in tokamak plasmas. For this purpose, the Semi-Lagrangian code GYSELA and the Eulerian code GT5D are employed, which solve the full-F gyrokinetic equation with a realistic fixed flux condition. Using the appropriate settings for the boundary and initial conditions, flux-driven ITG turbulence simulations are carried out. The avalanche-like transport is assessed with a focus on spatio-temporal properties. A statistical analysis is performed to discuss this self-organized criticality (SOC) like behaviors, where we found spectra and a transition to spectra at high-frequency side in both codes. Based on these benchmarks, it is verified that the SOC-like behavior is robust and not dependent on numerics.
Asahi, Yuichi*; Latu, G.*; Ina, Takuya; Idomura, Yasuhiro; Grandgirard, V.*; Garbet, X.*
IEEE Transactions on Parallel and Distributed Systems, 28(7), p.1974 - 1988, 2017/07
High-dimensional stencil computation from fusion plasma turbulence codes involving complex memory access patterns, the indirect memory access in a Semi-Lagrangian scheme and the strided memory access in a Finite-Difference scheme, are optimized on accelerators such as GPGPUs and Xeon Phi coprocessors. On both devices, the Array of Structure of Array (AoSoA) data layout is preferable for contiguous memory accesses. It is shown that the effective local cache usage by improving spatial and temporal data locality is critical on Xeon Phi. On GPGPU, the texture memory usage improves the performance of the indirect memory accesses in the Semi-Lagrangian scheme. Thanks to these optimizations, the fusion kernels on accelerators become 1.4x - 8.1x faster than those on Sandy Bridge (CPU).
Idomura, Yasuhiro; Asahi, Yuichi; Ina, Takuya; Matsuoka, Seikichi
Proceedings of 24th International Congress of Theoretical and Applied Mechanics (ICTAM 2016), p.3106 - 3107, 2016/08
Turbulent transport in fusion plasmas is one of key issues in ITER. To address this issue via the five dimensional (5D) gyrokinetic model, a novel computing technique is developed, and strong scaling of the Gyrokinetic Toroidal 5D Eulerian code GT5D is improved up to million cores on the K-computer. The computing technique consists of multi-dimensional/multi-layer domain decomposition, overlap of communication and computation, and optimization of computing kernels for multi-core CPUs. The computing power enabled us to study ITER relevant issues such as the plasma size scaling of turbulent transport. Towards the next generation burning plasma turbulence simulations, the physics model is extended including kinetic electrons and multi-species ions, and computing kernels are further optimized for the latest many-core architectures.
Shimada, Kenji*; Ueno, Hideki*; Neyens, G.*; Asahi, Koichiro*; Balabanski, D. L.*; Daugas, J. M.*; Depuydt, M.*; De Rydt, M.*; Gaudefroy, L.*; Grvy, S.*; et al.
Physics Letters B, 714(2-5), p.246 - 250, 2012/08
no abstracts in English
De Rydt, M.*; Neyens, G.*; Asahi, Koichiro*; Balabanski, D. L.*; Daugas, J. M.*; Depuydt, M.*; Gaudefroy, L.*; Grvy, S.*; Hasama, Yuka*; Ichikawa, Yuichi*; et al.
Physics Letters B, 678(4), p.344 - 349, 2009/07
no abstracts in English
Asahi, Yuichi; Ishizawa, Akihiro*; Watanabe, Tomohiko*; Sugama, Hideo*; Tsutsui, Hiroaki*; Iio, Shunji*
no journal, ,
Turbulent transport driven by electron temperature gradient (ETG) modes and trapped electron modes (TEMs) is investigated by means of gyrokinetic simulations. It is found that ETG turbulence can be suppressed by zonal flows driven by TEMs. Then, the mechanism of the regulation of ETG turbulence by zonal flows is investigated by nonlinear entropy transfer analysis. Firstly, it is confirmed that the entropy is transferred from TEMs to zonal flow. Secondly, it is found that the zonal flows in the steady state meditate the entropy transfer of the ETG modes from low to high radial wavenumber regions. In short, it is quantitatively shown that the zonal flows is driven by TEMs and the ETG turbulence is regulated by the TEM-driven zonal flows.
Asahi, Yuichi; Latu, G.*; Ina, Takuya; Idomura, Yasuhiro; Virginie, G.*; Garbet, X.*
no journal, ,
We present the optimization of kernels from fusion plasma codes, GYSELA and GT5D, on Tera-flops many-core architecturesincluding accelerators (Xeon Phi, GPU), and a multi-core CPUs (FX100). GYSELA kernel is based on a semi-Lagrangian scheme with high arithmetic intensity. Through the optimization of GYSELA kernel on Xeon Phi, we show the importance of the vectorization on Xeon Phi. For GT5D kernel, which is based on a finite difference scheme, a sophisticated memory access is necessary for high performance. Through the optimization of GT5D kernel on GPUs, we show the effective optimization for memory access on GPUs.
Asahi, Yuichi; Idomura, Yasuhiro; Ina, Takuya
no journal, ,
Because of its collision-less characteristics, fusion plasmas have fine structures in velocity space. Thus, for the analysis of fusion plasma turbulence which degrades plasma confinement, the five dimensional kinetic models are often employed rather than the usual three dimensional fluid model. In this study, we present the optimization of a fusion plasma code, GT5D, which employs a finite difference method. We then discuss the optimization techniques effective for high dimensional stencil based calculations.
Asahi, Yuichi; Latu, G.*; Ina, Takuya; Idomura, Yasuhiro; Grandgirard, V.*; Garbet, X.*
no journal, ,
We present the optimization of kernels from fusion plasma codes, GYSELA and GT5D, on Tera-flops many-core architectures including accelerators (Xeon Phi, GPU), and a multi-core CPU (FX100). GYSELA kernel is based on a semi-Lagrangian scheme with high arithmetic intensity. Through the optimization of GYSELA kernel on Xeon Phi, we show the importance of the vectorization of a code. For GT5D kernel, which is based on a finite difference scheme, a sophisticated memory access is necessary for attaining high performance. Through the optimization of GT5D kernel on GPUs, we show the effective optimization for memory access with the help of the shared memory.
Ina, Takuya; Asahi, Yuichi; Idomura, Yasuhiro
no journal, ,
Plasma turbulence simulation is requiring significant computational resources. In particular, in order to simulation of the International Thermonuclear Experimental Reactor ITER scale is essential to the Exa-scale machine. Exa-scale machine architecture is undecided, but it is believed that the architecture of the existing is based. The purpose of this study is to establish the optimization techniques of stencil calculations for Xeon phi, GPU and FX100. These architecture is teraflops-class computing performance. The dynamic scheduling and change from multi loop to single loop for the Xeon phi. Reuse of the Register and avoid warp divergence for the GPU. The promotion of the software prefetch for reuse L1 cache and L2 cache by adjusting the chunk size for the FX100. Performance is improved by applying the optimization to the Xeon phi, GPU and FX100. We confirmed the effective optimization method of stencil calculation for Xeon phi, GPU and FX100.
Asahi, Yuichi; Idomura, Yasuhiro; Maeyama, Shinya*; Nakata, Motoki*; Ishizawa, Akihiro*; Watanabe, Tomohiko*
no journal, ,
In conventional local turbulence simulations on ion-scale turbulence such as the ion temperature gradient turbulence, electrons have been assumed to respond adiabatically. In particular, passing electrons are considered to be adiabatic because of their fast motion along the magnetic field line. On the contrary, a recent study shows that non-adiabatic, i. e., kinetic responses of passing electrons can be important near the low-order mode rational surface, and they can contribute to heat and particle transport. In the presentation, we will show the impacts of passing electrons on the heat and particle transport through comparisons of local turbulence simulations employing different kinetic electron models.
Asahi, Yuichi; Idomura, Yasuhiro; Ina, Takuya; Garbet, X.*; Grandgirard, V.*; Latu, G.*
no journal, ,
In the so-called delta-f gyrokinetic simulations, the scale separation between the equilibrium and fluctuation plasmas is assumed, and the time evolution is solved only for the fluctuation part. In contrast, in the full-f gyrokinetic simulations, both of the equilibrium and fluctuation plasmas are solved on the basis of the same first principle, where the self-consistent simulations for the equilibrium and fluctuation plasmas are possible. So far, there are a plenty number of cross-code benchmarks for delta-f gyrokinetic simulations, which helps to improve the robustness of the simulations. However, this is not the case for the full-f simulations since the complicated full-f physics makes benchmarks more difficult. In the presentation, we will show the progress of the full-f benchmarks and discuss the confronting issues.
Matsumoto, Kazuya; Asahi, Yuichi*; Ina, Takuya; Idomura, Yasuhiro
no journal, ,
We present the implementation and performance evaluation results of the plasma physics simulation code called GT5D on a GPU cluster. In this study, an iterative matrix solver, which is identified as a performance bottleneck in the code, is tuned on the GPU. The measured performance is compared with attainable performance calculated by the roofline model. Additionally, we show the implementation with direction communications between GPUs for utilizing many GPUs.
Idomura, Yasuhiro; Asahi, Yuichi*; Hayashi, Nobuhiko*; Urano, Hajime*
no journal, ,
Full-f gyrokinetic simulations are important tools for analyzing nonlocal turbulent transport, plasma profiles, and the confinement time in fusion plasmas. However, the conventional full-f simulations were limited to ion turbulence with adiabatic electrons. In order to analyze ITER relevant electron turbulence, in this work, we develop a new hybrid electron model for full-f simulations, and verify its accuracy. In the model, passing electrons responses, which induce high frequency noises, are approximated by analytic solutions, and long time scale full-f simulations are enabled by eliminating the high frequency noises. Numerical experiments of electron turbulence using this model clarify new mechanisms for turbulence suppression and momentum transport related to electron turbulence transport. In a validation study, an experimental observation on plasma rotation changes induced by electron heating is successfully reproduced.