Refine your search:     
Report No.
 - 
Search Results: Records 1-20 displayed on this page of 380

Presentation/Publication Type

Initialising ...

Refine

Journal/Book Title

Initialising ...

Meeting title

Initialising ...

First Author

Initialising ...

Keyword

Initialising ...

Language

Initialising ...

Publication Year

Initialising ...

Held year of conference

Initialising ...

Save select records

Journal Articles

Dynamics of enhanced neoclassical particle transport of tracer impurity ions in ion temperature gradient driven turbulence

Idomura, Yasuhiro; Obrejan, K.*; Asahi, Yuichi; Honda, Mitsuru*

Physics of Plasmas, 28(1), p.012501_1 - 012501_11, 2021/01

Tracer impurity transport in ion temperature gradient driven (ITG) turbulence is investigated using a global full-$$f$$ gyrokinetic simulation including kinetic electrons, bulk ions, and low to medium $$Z$$ tracer impurities, where $$Z$$ is the charge number. It is found that in addition to turbulent particle transport, enhanced neoclassical particle transport due to a new synergy effect between turbulent and neoclassical transports makes a significant contribution to tracer impurity transport. Bursty excitation of the ITG mode generates non-ambipolar turbulent particle fluxes of electrons and bulk ions, leading to a fast growth of the radial electric field following the ambipolar condition. The divergence of $$Etimes B$$ flows compresses up-down asymmetric density perturbations, which are subject to transport induced by the magnetic drift. The enhanced neoclassical particle transport depends on the ion mass, because the magnitude of up-down asymmetric density perturbation is determined by a competition between the $$Etimes B$$ compression effect and the return current given by the parallel streaming motion. This mechanism does not work for the temperature, and thus, selectively enhances only particle transport.

Journal Articles

Plume dispersion simulation based on ensemble simulation with lattice Boltzmann method

Hasegawa, Yuta; Onodera, Naoyuki; Idomura, Yasuhiro

Dai-34-Kai Suchi Ryutai Rikigaku Shimpojiumu Koen Rombunshu (Internet), 3 Pages, 2020/12

We developed a real-time ensemble simulation code for analyzing urban wind conditions and plume dispersion using a locally mesh-refined lattice Boltzmann method. We validated the developed code against the wind tunnel experiment by AIST, and against the field experiment JU2003 in Oklahoma City. In the case of the wind tunnel experiment, the wind condition showed a good agreement with the experiment, and 61.2% of the tracer gas concentration data observed on the ground satisfied the FACTOR2 condition, that is an accuracy criterion given by the environmental assessment guideline. In the case of the field experiment JU2003, the instantaneous wind speed showed a good agreement with the experiment, while the wind direction showed a difference up to 100$$^{circ}$$. The means of the tracer gas concentration satisfied the FACTOR2 condition at all observation interval. These results demonstrate that the developed code is accurate enough for the environmental assessment.

Journal Articles

Acceleration of fusion plasma turbulence simulation on Fugaku and Summit

Idomura, Yasuhiro; Ina, Takuya*; Ali, Y.*; Imamura, Toshiyuki*

Dai-34-Kai Suchi Ryutai Rikigaku Shimpojiumu Koen Rombunshu (Internet), 6 Pages, 2020/12

A new communication avoiding (CA) Krylov solver with a FP16 (half precision) preconditioner is developed for a semi-implicit finite difference solver in the Gyrokinetic Toroidal 5D full-f Eulerian code GT5D. In the solver, the bottleneck of global collective communication is resolved using a CA-Krylov subspace method, and halo data communication is reduced by the FP16 preconditioner, which improves the convergence property. The FP16 preconditioner is designed based on the physics properties of the operator and is implemented using the new support for FP16 SIMD operations on A64FX. The solver is ported also on GPUs, and the performance of ITER size simulations with $$sim 0.1$$ trillion grids is measured on Fugaku (A64FX) and Summit (V100). The new solver accelerates GT5D by $$2 sim3times$$ from the conventional non-CA solver, and excellent strong scaling is obtained up to 5,760 CPUs/GPUs both on Fugaku and Summit.

Journal Articles

Acceleration of fusion plasma turbulence simulations using the mixed-precision communication-avoiding Krylov method

Idomura, Yasuhiro; Ina, Takuya*; Ali, Y.*; Imamura, Toshiyuki*

Proceedings of International Conference on High Performance Computing, Networking, Storage, and Analysis (SC 2020) (Internet), p.1318 - 1330, 2020/11

The multi-scale full-$$f$$ simulation of the next generation experimental fusion reactor ITER based on a five dimensional (5D) gyrokinetic model is one of the most computationally demanding problems in fusion science. In this work, a Gyrokinetic Toroidal 5D Eulerian code (GT5D) is accelerated by a new mixed-precision communication-avoiding (CA) Krylov method. The bottleneck of global collective communication on accelerated computing platforms is resolved using a CA Krylov method. In addition, a new FP16 preconditioner, which is designed using the new support for FP16 SIMD operations on A64FX, reduces both the number of iterations (halo data communication) and the computational cost. The performance of the proposed method for ITER size simulations with 0.1 trillion grids on 1,440 CPUs/GPUs on Fugaku and Summit shows 2.8x and 1.9x speedups respectively from the conventional non-CA Krylov method, and excellent strong scaling is obtained up to 5,760 CPUs/GPUs.

Journal Articles

Interactive in-situ steering and visualization of GPU-accelerated simulations using particle-based volume rendering

Kawamura, Takuma; Hasegawa, Yuta; Idomura, Yasuhiro

Proceedings of Joint International Conference on Supercomputing in Nuclear Applications + Monte Carlo 2020 (SNA + MC 2020), p.187 - 192, 2020/10

In order to realize the atmospheric dispersion prediction of pollutants, a fluid simulation by adaptive mesh refinement (AMR) optimized for GPU supercomputer has been developed, and interactive visualization and parameter steering of the simulation results are needed. In this study, we extend particle-based in-situ visualization method for structured grids into AMR, and enables in-situ steering of the simulation parameters by utilizing an in-situ control mechanism via files. By combining the developed method with plume dispersion simulation in urban areas running on a GPU platform, it was shown that human-in-the-loop pollution source search is possible without enormous parameter scanning.

Journal Articles

Communication-avoiding Krylov solvers for extreme scale nuclear CFD simulations

Idomura, Yasuhiro; Ina, Takuya*; Ali, Y.*; Imamura, Toshiyuki*

Proceedings of Joint International Conference on Supercomputing in Nuclear Applications + Monte Carlo 2020 (SNA + MC 2020), p.225 - 230, 2020/10

A new communication avoiding (CA) Krylov solver with a FP16 (half precision) preconditioner is developed for a semi-implicit finite difference solver in the Gyrokinetic Toroidal 5D full-f Eulerian code GT5D. In the solver, the bottleneck of global collective communication is resolved using a CA-Krylov subspace method, while the number of halo data communication is reduced by improving the convergence property using the FP16 preconditioner. The FP16 preconditioner is designed based on the physics properties of the operator and is implemented using the new support for FP16 SIMD operations on A64FX. The solver is ported on Fugaku (A64FX) and Summit (V100), which respectively show $$sim$$63x and $$sim$$29x speedups in socket performance compared to the conventional non-CA Krylov solver on JAEA-ICEX (Haswell).

Journal Articles

Ensemble wind simulations using a mesh-refined lattice Boltzmann method on GPU-accelerated systems

Hasegawa, Yuta; Onodera, Naoyuki; Idomura, Yasuhiro

Proceedings of Joint International Conference on Supercomputing in Nuclear Applications + Monte Carlo 2020 (SNA + MC 2020), p.236 - 242, 2020/10

The wind condition and the plume dispersion in urban areas are strongly affected by buildings and plants, which are hardly described in the conventional mesoscale simulations. To resolve this issue, we developed a GPU-based CFD code using a mesh-refined lattice Boltzmann method (LBM), which enables real-time plume dispersion simulations with a resolution of several meters. However, such high resolution simulations are highly turbulent and the time histories of the results are sensitive to various simulations conditions. In order to improve the reliability of such chaotic simulations, we developed an ensemble simulation approach, which enables a statistical estimation of the uncertainty. We examined the developed code against the field experiment JU2003 in Oklahoma City. In the comparison, the wind conditions showed good agreements, and the average values of the tracer gas concentration satisfied the factor 2 agreements between the ensemble simulation data and the experiment.

Journal Articles

GPU-acceleration of locally mesh allocated two phase flow solver for nuclear reactors

Onodera, Naoyuki; Idomura, Yasuhiro; Ali, Y.*; Yamashita, Susumu; Shimokawabe, Takashi*; Aoki, Takayuki*

Proceedings of Joint International Conference on Supercomputing in Nuclear Applications + Monte Carlo 2020 (SNA + MC 2020), p.210 - 215, 2020/10

This paper presents a GPU-based Poisson solver on a block-based adaptive mesh refinement (block-AMR) framework. The block-AMR method is essential for GPU computation and efficient description of the nuclear reactor. In this paper, we successfully implement a conjugate gradient method with a state-of-the-art multi-grid preconditioner (MG-CG) on the block-AMR framework. GPU kernel performance was measured on the GPU-based supercomputer TSUBAME3.0. The bandwidth of a vector-vector sum, a matrix-vector product, and a dot product in the CG kernel gave good performance at about 60% of the peak performance. In the MG kernel, the smoothers in a three-stage V-cycle MG method are implemented using a mixed precision RB-SOR method, which also gave good performance. For a large-scale Poisson problem with $$453.0 times 10^6$$ cells, the developed MG-CG method reduced the number of iterations to less than 30% and achieved $$times$$ 2.5 speedup compared with the original preconditioned CG method.

Journal Articles

Estimation of air dose rate using measurement results of monitoring posts in Fukushima Prefecture

Seki, Akiyuki; Mayumi, Akie; Wainwright-Murakami, Haruko*; Saito, Kimiaki; Takemiya, Hiroshi; Idomura, Yasuhiro

Proceedings of Joint International Conference on Supercomputing in Nuclear Applications + Monte Carlo 2020 (SNA + MC 2020), p.158 - 164, 2020/10

We developed a method to estimate the temporal change of the air dose rate at the location with sparse (in time) measurements by using the continuous measurement data from the nearby monitoring post. This method determines an observation model from the correlation between sparse data at the target location and dense data at the monitoring post based on a hierarchical Bayesian model. The developed method was validated against the air dose rate measured at the monitoring posts in Fukushima prefecture from 2012 to 2017. The results showed that the developed method can predict the air dose rate at almost all target locations with an error rate of less than 10%.

Journal Articles

Communication avoiding multigrid preconditioned conjugate gradient method for extreme scale multiphase CFD simulations

Idomura, Yasuhiro; Onodera, Naoyuki; Yamada, Susumu; Yamashita, Susumu; Ina, Takuya*; Imamura, Toshiyuki*

Supa Kompyuthingu Nyusu, 22(5), p.18 - 29, 2020/09

A communication avoiding multigrid preconditioned conjugate gradient method (CAMGCG) is applied to the pressure Poisson equation in a multiphase CFD code JUPITER, and its computational performance and convergence property are compared against the conventional Krylov methods. The CAMGCG solver has robust convergence properties regardless of the problem size, and shows both communication reduction and convergence improvement, leading to higher performance gain than CA Krylov solvers, which achieve only the former. The CAMGCG solver is applied to extreme scale multiphase CFD simulations with 90 billion DOFs, and its performance is compared against the preconditioned CG solver. In this benchmark, the number of iterations is reduced to $$sim 1/800$$, and $$sim 11.6times$$ speedup is achieved with keeping excellent strong scaling up to 8,000 nodes on the Oakforest-PACS.

Journal Articles

Improvement in interactive remote in situ visualization using SIMD-aware function parser and asynchronous data I/O

Kawamura, Takuma; Idomura, Yasuhiro

Journal of Visualization, 23(4), p.695 - 706, 2020/08

 Times Cited Count:0 Percentile:100(Computer Science, Interdisciplinary Applications)

An in-situ visualization system based on the particle-based volume rendering offers a highly scalable and flexible visual analytics environment based on multivariate volume rendering. Although it showed excellent computational performance on the conventional CPU platforms, accelerated computation on the latest many core platforms revealed performance bottlenecks related to a function parser and particles I/O. In this paper, we develop a new SIMD-aware function parser and an asynchronous data I/O method based on task-based thread parallelization. Numerical experiments on the Oakforest-PACS, which consists of 8208 Intel Xeon Phi7250 (Knights Landing) processors, demonstrate an order of magnitude speedup with keeping improved strong scaling up to $$sim$$ 100 k cores.

Journal Articles

Self-organization of zonal flows and isotropic eddies in toroidal electron temperature gradient driven turbulence

Kawai, Chika*; Idomura, Yasuhiro; Ogawa, Yuichi*; Yamada, Hiroshi*

Physics of Plasmas, 27(8), p.082302_1 - 082302_11, 2020/08

 Times Cited Count:0 Percentile:100(Physics, Fluids & Plasmas)

Self-organization in the toroidal electron temperature gradient driven (ETG) turbulence is investigated based on a global gyrokinetic model in a weak magnetic shear configuration. Because of global profile effects, toroidal ETG modes with higher toroidal mode number n are excited at the outer magnetic surfaces, leading to strong linear wave dispersion. The resulting anisotropic wave turbulence boundary and the inverse energy cascade generate the self-organization of zonal flows, which is the unique mechanism in the global gyrokinetic model. The self-organization is confirmed both in the decaying turbulence initialized by random noises and in the toroidal ETG turbulence. It is also shown that the self-organization process generates zonal flows and isotropic eddies depending on a criterion parameter, which is determined by the ion to electron temperature ratio and the turbulence intensity.

Journal Articles

Ensemble wind simulation using a mesh-refined lattice Boltzmann method

Hasegawa, Yuta; Onodera, Naoyuki; Idomura, Yasuhiro

Dai-25-Kai Nippon Keisan Kogaku Koenkai Rombunshu (CD-ROM), 4 Pages, 2020/06

We developed a GPU-based CFD code using a mesh-refined lattice Boltzmann method (LBM), which enables ensemble simulations for wind and plume dispersion in urban cities. The code is tuned for Pascal or Volta GPU architectures, and is able to perform real-time wind simulations with several kilometers square region and several meters of grid resolution. We examined the developed code against the field experiment JU2003 in Oklahoma City. In the comparison, wind conditions showed good agreements, and the ensemble-averaged and maximum values of tracer concentration satisfied the factor 2 agreements.

Journal Articles

GPU-acceleration of locally mesh allocated Poisson solver

Onodera, Naoyuki; Idomura, Yasuhiro; Ali, Y.*; Shimokawabe, Takashi*; Aoki, Takayuki*

Dai-25-Kai Nippon Keisan Kogaku Koenkai Rombunshu (CD-ROM), 4 Pages, 2020/06

We have developed the stencil-based CFD code JUPITER for simulating three-dimensional multiphase flows. A GPU-accelerated Poisson solver based on the preconditioned conjugate gradient (P-CG) method with a multigrid preconditioner was developed for the JUPITER with block-structured AMR mesh. All Poisson kernels were implemented using CUDA, and the GPU kernel function is well tuned to achieve high performance on GPU supercomputers. The developed multigrid solver shows good convergence of about 1/7 compared with the original P-CG method, and $$times$$3 speed up is achieved with strong scaling test from 8 to 216 GPUs on TSUBAME 3.0.

Journal Articles

Locally mesh-refined lattice Boltzmann method for fuel debris air cooling analysis on GPU supercomputer

Onodera, Naoyuki; Idomura, Yasuhiro; Uesawa, Shinichiro; Yamashita, Susumu; Yoshida, Hiroyuki

Mechanical Engineering Journal (Internet), 7(3), p.19-00531_1 - 19-00531_10, 2020/06

A dry method is one of practical methods for decommissioning the TEPCO's Fukushima Daiichi Nuclear Power Station. Japan Atomic Energy Agency (JAEA) has been evaluating the air cooling performance of the fuel debris by using the JUPITER code based on an incompressible fluid model and the CityLBM code based on the lattice Boltzmann method (LBM). However, these codes were based on a uniform Cartesian grid system, and required large computational time and cost to capture complicated debris structures. We develop an adaptive mesh refinement (AMR) version of the CityLBM code on GPU based supercomputers and apply it to thermal-hydrodynamics problems. The proposed method is validated against free convective heat transfer experiments at JAEA. It is also shown that the AMR based CityLBM code on 4 NVIDIA TESLA V100GPUs gives 6.7x speedup of the time to solution compared with the JUPITER code on 36 Intel Xeon E5-2680v3 CPUs.

Journal Articles

Overlapping communications in gyrokinetic codes on accelerator-based platforms

Asahi, Yuichi*; Latu, G.*; Bigot, J.*; Maeyama, Shinya*; Grandgirard, V.*; Idomura, Yasuhiro

Concurrency and Computation; Practice and Experience, 32(5), p.e5551_1 - e5551_21, 2020/03

 Times Cited Count:0 Percentile:100(Computer Science, Software Engineering)

Two five-dimensional gyrokinetic codes GYSELA and GKV were ported to the modern accelerators, Xeon Phi KNL and Tesla P100 GPU. Serial computing kernels of GYSELA on KNL and GKV on P100 GPU were respectively 1.3x and 7.4x faster than those on a single Skylake processor. Scaling tests of GYSELA and GKV were respectively performed from 16 to 512 KNLs and from 32 to 256 P100 GPUs, and data transpose communications in semi-Lagrangian kernels in GYSELA and in convolution kernels in GKV were found to be main bottlenecks, respectively. In order to mitigate the communication costs, pipeline-based and task-based communication overlapping were implemented in these codes.

Journal Articles

Implementation and performance evaluation of a communication-avoiding GMRES method for stencil-based code on GPU cluster

Matsumoto, Kazuya*; Idomura, Yasuhiro; Ina, Takuya*; Mayumi, Akie; Yamada, Susumu

Journal of Supercomputing, 75(12), p.8115 - 8146, 2019/12

 Times Cited Count:0 Percentile:100(Computer Science, Hardware & Architecture)

A communication-avoiding generalized minimum residual method (CA-GMRES) is implemented on a hybrid CPU-GPU cluster, targeted for the performance acceleration of iterative linear system solver in the gyrokinetic toroidal five-dimensional Eulerian code GT5D. In addition to the CA-GMRES, we implement and evaluate a modified variant of CA-GMRES (M-CA-GMRES) proposed in our previous study to reduce the amount of floating-point calculations. This study demonstrates that beneficial features of the CA-GMRES are in its minimum number of collective communications and its highly efficient calculations based on dense matrix-matrix operations. The performance evaluation is conducted on the Reedbush-L GPU cluster, which contains four NVIDIA Tesla P100 GPUs per compute node. The evaluation results show that the M-CA-GMRES is 1.09x, 1.22x and 1.50x faster than the CA-GMRES, the generalized conjugate residual method (GCR), and the GMRES, respectively, when 64 GPUs are used.

Journal Articles

Isotope and plasma size scaling in ion temperature gradient driven turbulence

Idomura, Yasuhiro

Physics of Plasmas, 26(12), p.120703_1 - 120703_5, 2019/12

 Times Cited Count:1 Percentile:72.04(Physics, Fluids & Plasmas)

This Letter presents the impacts of the hydrogen isotope mass and the normalized gyroradius $$rho^*$$ on L-mode like hydrogen (H) and deuterium (D) plasmas dominated by ion temperature gradient driven (ITG) turbulence using global full-f gyrokinetic simulations. In ion heated numerical experiments with adiabatic electrons, the energy confinement time shows almost no isotope mass dependency, and is determined by Bohm like $$rho^*$$ scaling. Electron heated numerical experiments with kinetic electrons show clear isotope mass dependency caused by the isotope effect on the collisional energy transfer from electrons to ions, and the H and D plasmas show similar ion and electron temperature profiles at an H to D heating power ratio of $$sim 1.4$$. The normalized collisionless ion gyrokinetic equations for H and D plasmas become identical at the same $$rho^*$$, and collisions weakly affect ITG turbulence. Therefore, the isotope mass dependency is mainly contributed by the $$rho^*$$ scaling and the heating sources.

Journal Articles

GPU acceleration of communication avoiding Chebyshev basis conjugate gradient solver for multiphase CFD simulations

Ali, Y.*; Onodera, Naoyuki; Idomura, Yasuhiro; Ina, Takuya*; Imamura, Toshiyuki*

Proceedings of 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA 2019), p.1 - 8, 2019/11

 Times Cited Count:3 Percentile:1.37

Iterative methods for solving large linear systems are common parts of computational fluid dynamics (CFD) codes. The Preconditioned Conjugate Gradient (P-CG) method is one of the most widely used iterative methods. However, in the P-CG method, global collective communication is a crucial bottleneck especially on accelerated computing platforms. To resolve this issue, communication avoiding (CA) variants of the P-CG method are becoming increasingly important. In this paper, the P-CG and Preconditioned Chebyshev Basis CA CG (P-CBCG) solvers in the multiphase CFD code JUPITER are ported to the latest V100 GPUs. All GPU kernels are highly optimized to achieve about 90% of the roofline performance, the block Jacobi preconditioner is re-designed to extract high computing power of GPUs, and the remaining bottleneck of halo data communication is avoided by overlapping communication and computation. The overall performance of the P-CG and P-CBCG solvers is determined by the competition between the CA properties of the global collective communication and the halo data communication, indicating an importance of the inter-node interconnect bandwidth per GPU. The developed GPU solvers are accelerated up to 2x compared with the former CPU solvers on KNLs, and excellent strong scaling is achieved up to 7,680 GPUs on the Summit.

Journal Articles

Exascale simulations of fusion plasmas

Idomura, Yasuhiro; Watanabe, Tomohiko*; Todo, Yasushi*

Shimyureshon, 38(2), p.79 - 86, 2019/06

We promote the research and development of exascale fusion plasma simulations on Post-K towards estimation and prediction of core plasma performance, and exploration of improved operation scenarios on the next generation fusion experimental reactor ITER. In this paper, we review developed exascale simulation technologies and outcomes from validation studies on existing experimental devices, and discuss perspectives on exascale fusion plasma simulations on Post-K.

380 (Records 1-20 displayed on this page)