Refine your search:     
Report No.
 - 
Search Results: Records 1-20 displayed on this page of 22

Presentation/Publication Type

Initialising ...

Refine

Journal/Book Title

Initialising ...

Meeting title

Initialising ...

First Author

Initialising ...

Keyword

Initialising ...

Language

Initialising ...

Publication Year

Initialising ...

Held year of conference

Initialising ...

Save select records

Journal Articles

Synergy of turbulent and neoclassical transport through poloidal convective cells

Asahi, Yuichi*; Grandgirard, V.*; Sarazin, Y.*; Donnel, P.*; Garbet, X.*; Idomura, Yasuhiro; Dif-Pradalier, G.*; Latu, G.*

Plasma Physics and Controlled Fusion, 61(6), p.065015_1 - 065015_15, 2019/05

 Percentile:100

The role of poloidal convective cells on transport processes is studied with the full-F gyrokinetic code GYSELA. For this purpose, we apply a numerical filter to convective cells and compare the simulation results with and without the filter. The energy flux driven by the magnetic drifts turns out to be reduced by a factor of about 2 once the numerical filter is applied. A careful analysis reveals that the frequency spectrum of the convective cells is well-correlated with that of the turbulent Reynolds stress tensor, giving credit to their turbulence-driven origin. The impact of convective cells can be interpreted as a synergy between turbulence and neoclassical dynamics.

Journal Articles

Application of a communication-avoiding generalized minimal residual method to a gyrokinetic five dimensional Eulerian code on many core platforms

Idomura, Yasuhiro; Ina, Takuya*; Mayumi, Akie; Yamada, Susumu; Matsumoto, Kazuya*; Asahi, Yuichi*; Imamura, Toshiyuki*

Proceedings of 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA 2017), p.7_1 - 7_8, 2017/11

A communication-avoiding generalized minimal residual (CA-GMRES) method is applied to the gyrokinetic toroidal five dimensional Eulerian code GT5D, and its performance is compared against the original code with a generalized conjugate residual (GCR) method on the JAEA ICEX (Haswell), the Plasma Simulator (FX100), and the Oakforest-PACS (KNL). The CA-GMRES method has $$sim 3.8times$$ higher arithmetic intensity than the GCR method, and thus, is suitable for future Exa-scale architectures with limited memory and network bandwidths. In the performance evaluation, it is shown that compared with the GCR solver, its computing kernels are accelerated by $$1.47times sim 2.39times$$, and the cost of data reduction communication is reduced from $$5%sim 13%$$ to $$sim1%$$ of the total cost at 1,280 nodes.

Journal Articles

Benchmarking of flux-driven full-F gyrokinetic simulations

Asahi, Yuichi*; Grandgirard, V.*; Idomura, Yasuhiro; Garbet, X.*; Latu, G.*; Sarazin, Y.*; Dif-Pradalier, G.*; Donnel, P.*; Ehrlacher, C.*

Physics of Plasmas, 24(10), p.102515_1 - 102515_17, 2017/10

AA2017-0418.pdf:4.26MB

 Percentile:100(Physics, Fluids & Plasmas)

Two full-F global gyrokinetic codes are benchmarked to compute flux-driven ion temperature gradient turbulence in tokamak plasmas. For this purpose, the Semi-Lagrangian code GYSELA and the Eulerian code GT5D are employed, which solve the full-F gyrokinetic equation with a realistic fixed flux condition. Using the appropriate settings for the boundary and initial conditions, flux-driven ITG turbulence simulations are carried out. The avalanche-like transport is assessed with a focus on spatio-temporal properties. A statistical analysis is performed to discuss this self-organized criticality (SOC) like behaviors, where we found $$1/f$$ spectra and a transition to $$1/f^3$$ spectra at high-frequency side in both codes. Based on these benchmarks, it is verified that the SOC-like behavior is robust and not dependent on numerics.

Journal Articles

Optimization of fusion kernels on accelerators with indirect or strided memory access patterns

Asahi, Yuichi*; Latu, G.*; Ina, Takuya; Idomura, Yasuhiro; Grandgirard, V.*; Garbet, X.*

IEEE Transactions on Parallel and Distributed Systems, 28(7), p.1974 - 1988, 2017/07

 Times Cited Count:1 Percentile:60.51(Computer Science, Theory & Methods)

High-dimensional stencil computation from fusion plasma turbulence codes involving complex memory access patterns, the indirect memory access in a Semi-Lagrangian scheme and the strided memory access in a Finite-Difference scheme, are optimized on accelerators such as GPGPUs and Xeon Phi coprocessors. On both devices, the Array of Structure of Array (AoSoA) data layout is preferable for contiguous memory accesses. It is shown that the effective local cache usage by improving spatial and temporal data locality is critical on Xeon Phi. On GPGPU, the texture memory usage improves the performance of the indirect memory accesses in the Semi-Lagrangian scheme. Thanks to these optimizations, the fusion kernels on accelerators become 1.4x - 8.1x faster than those on Sandy Bridge (CPU).

Journal Articles

Computational challenges towards Exa-scale fusion plasma turbulence simulations

Idomura, Yasuhiro; Asahi, Yuichi; Ina, Takuya; Matsuoka, Seikichi

Proceedings of 24th International Congress of Theoretical and Applied Mechanics (ICTAM 2016), p.3106 - 3107, 2016/08

Turbulent transport in fusion plasmas is one of key issues in ITER. To address this issue via the five dimensional (5D) gyrokinetic model, a novel computing technique is developed, and strong scaling of the Gyrokinetic Toroidal 5D Eulerian code GT5D is improved up to $$sim 0.6$$ million cores on the K-computer. The computing technique consists of multi-dimensional/multi-layer domain decomposition, overlap of communication and computation, and optimization of computing kernels for multi-core CPUs. The computing power enabled us to study ITER relevant issues such as the plasma size scaling of turbulent transport. Towards the next generation burning plasma turbulence simulations, the physics model is extended including kinetic electrons and multi-species ions, and computing kernels are further optimized for the latest many-core architectures.

Journal Articles

Erosion of $$N$$=20 shell in $$^{33}$$Al investigated through the ground-state electric quadrupole moment

Shimada, Kenji*; Ueno, Hideki*; Neyens, G.*; Asahi, Koichiro*; Balabanski, D. L.*; Daugas, J. M.*; Depuydt, M.*; De Rydt, M.*; Gaudefroy, L.*; Gr$'e$vy, S.*; et al.

Physics Letters B, 714(2-5), p.246 - 250, 2012/08

 Times Cited Count:5 Percentile:57.78(Astronomy & Astrophysics)

no abstracts in English

Journal Articles

Precision measurement of the electric quadrupole moment of $$^{31}$$Al and determination of the effective proton charge in the sd-shell

De Rydt, M.*; Neyens, G.*; Asahi, Koichiro*; Balabanski, D. L.*; Daugas, J. M.*; Depuydt, M.*; Gaudefroy, L.*; Gr$'e$vy, S.*; Hasama, Yuka*; Ichikawa, Yuichi*; et al.

Physics Letters B, 678(4), p.344 - 349, 2009/07

 Times Cited Count:11 Percentile:35.09(Astronomy & Astrophysics)

no abstracts in English

Oral presentation

The Influence of trapped electron mode driven zonal flow on electron temperature gradient driven turbulence

Asahi, Yuichi; Ishizawa, Akihiro*; Watanabe, Tomohiko*; Sugama, Hideo*; Tsutsui, Hiroaki*; Iio, Shunji*

no journal, , 

Turbulent transport driven by electron temperature gradient (ETG) modes and trapped electron modes (TEMs) is investigated by means of gyrokinetic simulations. It is found that ETG turbulence can be suppressed by zonal flows driven by TEMs. Then, the mechanism of the regulation of ETG turbulence by zonal flows is investigated by nonlinear entropy transfer analysis. Firstly, it is confirmed that the entropy is transferred from TEMs to zonal flow. Secondly, it is found that the zonal flows in the steady state meditate the entropy transfer of the ETG modes from low to high radial wavenumber regions. In short, it is quantitatively shown that the zonal flows is driven by TEMs and the ETG turbulence is regulated by the TEM-driven zonal flows.

Oral presentation

Optimization of fusion plasma codes

Asahi, Yuichi; Latu, G.*; Ina, Takuya; Idomura, Yasuhiro; Virginie, G.*; Garbet, X.*

no journal, , 

We present the optimization of kernels from fusion plasma codes, GYSELA and GT5D, on Tera-flops many-core architecturesincluding accelerators (Xeon Phi, GPU), and a multi-core CPUs (FX100). GYSELA kernel is based on a semi-Lagrangian scheme with high arithmetic intensity. Through the optimization of GYSELA kernel on Xeon Phi, we show the importance of the vectorization on Xeon Phi. For GT5D kernel, which is based on a finite difference scheme, a sophisticated memory access is necessary for high performance. Through the optimization of GT5D kernel on GPUs, we show the effective optimization for memory access on GPUs.

Oral presentation

Optimization of fusion plasma turbulence code on GPU

Asahi, Yuichi; Idomura, Yasuhiro; Ina, Takuya

no journal, , 

Because of its collision-less characteristics, fusion plasmas have fine structures in velocity space. Thus, for the analysis of fusion plasma turbulence which degrades plasma confinement, the five dimensional kinetic models are often employed rather than the usual three dimensional fluid model. In this study, we present the optimization of a fusion plasma code, GT5D, which employs a finite difference method. We then discuss the optimization techniques effective for high dimensional stencil based calculations.

Oral presentation

Optimization of stencil-based fusion kernels on Tera-flops many-core architectures

Asahi, Yuichi; Latu, G.*; Ina, Takuya; Idomura, Yasuhiro; Grandgirard, V.*; Garbet, X.*

no journal, , 

We present the optimization of kernels from fusion plasma codes, GYSELA and GT5D, on Tera-flops many-core architectures including accelerators (Xeon Phi, GPU), and a multi-core CPU (FX100). GYSELA kernel is based on a semi-Lagrangian scheme with high arithmetic intensity. Through the optimization of GYSELA kernel on Xeon Phi, we show the importance of the vectorization of a code. For GT5D kernel, which is based on a finite difference scheme, a sophisticated memory access is necessary for attaining high performance. Through the optimization of GT5D kernel on GPUs, we show the effective optimization for memory access with the help of the shared memory.

Oral presentation

Development of optimization of stencil calculation on Tera-flops many-core architecture

Ina, Takuya; Asahi, Yuichi; Idomura, Yasuhiro

no journal, , 

Plasma turbulence simulation is requiring significant computational resources. In particular, in order to simulation of the International Thermonuclear Experimental Reactor ITER scale is essential to the Exa-scale machine. Exa-scale machine architecture is undecided, but it is believed that the architecture of the existing is based. The purpose of this study is to establish the optimization techniques of stencil calculations for Xeon phi, GPU and FX100. These architecture is teraflops-class computing performance. The dynamic scheduling and change from multi loop to single loop for the Xeon phi. Reuse of the Register and avoid warp divergence for the GPU. The promotion of the software prefetch for reuse L1 cache and L2 cache by adjusting the chunk size for the FX100. Performance is improved by applying the optimization to the Xeon phi, GPU and FX100. We confirmed the effective optimization method of stencil calculation for Xeon phi, GPU and FX100.

Oral presentation

Effects of passing electrons on electron heat and particle transport

Asahi, Yuichi; Idomura, Yasuhiro; Maeyama, Shinya*; Nakata, Motoki*; Ishizawa, Akihiro*; Watanabe, Tomohiko*

no journal, , 

In conventional local turbulence simulations on ion-scale turbulence such as the ion temperature gradient turbulence, electrons have been assumed to respond adiabatically. In particular, passing electrons are considered to be adiabatic because of their fast motion along the magnetic field line. On the contrary, a recent study shows that non-adiabatic, i. e., kinetic responses of passing electrons can be important near the low-order mode rational surface, and they can contribute to heat and particle transport. In the presentation, we will show the impacts of passing electrons on the heat and particle transport through comparisons of local turbulence simulations employing different kinetic electron models.

Oral presentation

Benchmark test of full-f gyrokinteic codes

Asahi, Yuichi; Idomura, Yasuhiro; Ina, Takuya; Garbet, X.*; Grandgirard, V.*; Latu, G.*

no journal, , 

In the so-called delta-f gyrokinetic simulations, the scale separation between the equilibrium and fluctuation plasmas is assumed, and the time evolution is solved only for the fluctuation part. In contrast, in the full-f gyrokinetic simulations, both of the equilibrium and fluctuation plasmas are solved on the basis of the same first principle, where the self-consistent simulations for the equilibrium and fluctuation plasmas are possible. So far, there are a plenty number of cross-code benchmarks for delta-f gyrokinetic simulations, which helps to improve the robustness of the simulations. However, this is not the case for the full-f simulations since the complicated full-f physics makes benchmarks more difficult. In the presentation, we will show the progress of the full-f benchmarks and discuss the confronting issues.

Oral presentation

High performance implementation of nuclear fusion simulation code on GPU cluster

Matsumoto, Kazuya; Asahi, Yuichi*; Ina, Takuya; Idomura, Yasuhiro

no journal, , 

We present the implementation and performance evaluation results of the plasma physics simulation code called GT5D on a GPU cluster. In this study, an iterative matrix solver, which is identified as a performance bottleneck in the code, is tuned on the GPU. The measured performance is compared with attainable performance calculated by the roofline model. Additionally, we show the implementation with direction communications between GPUs for utilizing many GPUs.

Oral presentation

Full-f gyrokinetic simulation including kinetic electrons

Idomura, Yasuhiro; Asahi, Yuichi*; Hayashi, Nobuhiko*; Urano, Hajime*

no journal, , 

Full-f gyrokinetic simulations are important tools for analyzing nonlocal turbulent transport, plasma profiles, and the confinement time in fusion plasmas. However, the conventional full-f simulations were limited to ion turbulence with adiabatic electrons. In order to analyze ITER relevant electron turbulence, in this work, we develop a new hybrid electron model for full-f simulations, and verify its accuracy. In the model, passing electrons responses, which induce high frequency noises, are approximated by analytic solutions, and long time scale full-f simulations are enabled by eliminating the high frequency noises. Numerical experiments of electron turbulence using this model clarify new mechanisms for turbulence suppression and momentum transport related to electron turbulence transport. In a validation study, an experimental observation on plasma rotation changes induced by electron heating is successfully reproduced.

Oral presentation

Benchmarking of global full-f gyrokinetic codes

Asahi, Yuichi*; Garbet, X.*; Idomura, Yasuhiro; Grandgirard, V.*; Latu, G.*; Sarazin, Y.*; Dif-Pradalier, G.*; Donnel, P.*; Ehrlacher, C.*; Passeron, Ch.*

no journal, , 

Two global full-f gyrokinetic codes, which have been developed at CEA and JAEA, are benchmarked. Quantitative agreements between two codes are obtained regarding linear processes such as the linear stability of ion temperature gradient driven modes, the linear damping of zonal flows, and the collisional transport. Preliminary benchmarks on nonlinear turbulence simulations show some differences of calculation results, which arise due to differences in calculation models such as boundary conditions and heat source models, and the remaining issues towards quantitative nonlinear benchmarks are clarified.

Oral presentation

Acceleration of stencil-based fusion kernels

Asahi, Yuichi*; Latu, G.*; Ina, Takuya; Idomura, Yasuhiro; Grandgirard, V.*; Garbet, X.*

no journal, , 

Computation kernels of fusion plasma turbulence codes based on the Semi-Lagrangian scheme and the Finite-Difference scheme are optimized on latest many core processors such as GPGPU, XeonPhi, and FX100, and 1.4x-8.1x speedup is achieved. Affinity between different memory access patterns in each numerical scheme and difference memory-cache architectures on each hardware is studied, and different optimization techniques are developed for each architecture. On Xeon Phi, thread load balance is improved, and an optimization technique for effective local cache usage is developed. On GPGPU, an optimization technique using a texture memory and an implementation to reuse registers are developed. On the other hand, on FX100, it is found that the conventional optimization techniques for CPU work.

Oral presentation

Results from BMTFF projects

Asahi, Yuichi*; Grandgirard, V.*; Idomura, Yasuhiro; Sarazin, Y.*; Latu, G.*; Garbet, X.*

no journal, , 

This talk reviews outcomes from BMTFF projects, which was conducted for FY2015-2016. In this project, in order to establish a firm basis of full-f gyrokinetic models, two major full-f gyrokinetic codes in EU and Japan, GYSELA and GT5D, were benchmarked. In FY2015, all the numerical implementations were examined, and boundary conditions were fixed to be the same. With this correction, collisional transport, linear zonal flow damping, and linear stability of the ion temperature gradient driven (ITG) mode were successfully benchmarked. In FY2016, the same source and sink models were implemented in both codes, and nonlinear turbulence simulations were benchmarked. Decaying ITG turbulence simulations without heat sources showed similar profile relaxation processes, and nonlinear critical temperature gradients agreed quantitatively with each other. On the other hand, driven ITG turbulence simulations with heat sources showed intermittent bursts of avalanche like transport, which indicate similar 1/f type frequency spectra.

Oral presentation

Performance evaluation of a modified communication-avoiding generalized minimal residual method on many core platforms

Idomura, Yasuhiro; Ina, Takuya*; Mayumi, Akie; Yamada, Susumu; Matsumoto, Kazuya*; Asahi, Yuichi*; Imamura, Toshiyuki*

no journal, , 

We propose a modified communication-avoiding generalized minimal residual (CA-GMRES) method, which reduces both computation and memory access by 30% with keeping the same CA property as the original CA-GMRES method. These numerical properties, less communication and computation with higher arithmetic intensity, are promising features for future exascale machines with limited memory and network bandwidths. The modified CA-GMRES method is applied to a large scale non-symmetric matrix in an implicit solver of the gyrokinetic toroidal five dimensional Eulerian code GT5D, and its performance is estimated on the Oakforest-PACS (KNL). The numerical experiment shows that compared with the generalized conjugate residual method, computing kernels are accelerated by 1.5x, and the cost of data reduction communication is reduced from 12.5% to 1% of the total cost at 1,280 nodes.

22 (Records 1-20 displayed on this page)