Refine your search:     
Report No.
 - 
Search Results: Records 1-20 displayed on this page of 56

Presentation/Publication Type

Initialising ...

Refine

Journal/Book Title

Initialising ...

Meeting title

Initialising ...

First Author

Initialising ...

Keyword

Initialising ...

Language

Initialising ...

Publication Year

Initialising ...

Held year of conference

Initialising ...

Save select records

Journal Articles

Plume dispersion simulation based on ensemble simulation with lattice Boltzmann method

Hasegawa, Yuta; Onodera, Naoyuki; Idomura, Yasuhiro

Dai-34-Kai Suchi Ryutai Rikigaku Shimpojiumu Koen Rombunshu (Internet), 3 Pages, 2020/12

We developed a real-time ensemble simulation code for analyzing urban wind conditions and plume dispersion using a locally mesh-refined lattice Boltzmann method. We validated the developed code against the wind tunnel experiment by AIST, and against the field experiment JU2003 in Oklahoma City. In the case of the wind tunnel experiment, the wind condition showed a good agreement with the experiment, and 61.2% of the tracer gas concentration data observed on the ground satisfied the FACTOR2 condition, that is an accuracy criterion given by the environmental assessment guideline. In the case of the field experiment JU2003, the instantaneous wind speed showed a good agreement with the experiment, while the wind direction showed a difference up to 100$$^{circ}$$. The means of the tracer gas concentration satisfied the FACTOR2 condition at all observation interval. These results demonstrate that the developed code is accurate enough for the environmental assessment.

Journal Articles

Ensemble wind simulations using a mesh-refined lattice Boltzmann method on GPU-accelerated systems

Hasegawa, Yuta; Onodera, Naoyuki; Idomura, Yasuhiro

Proceedings of Joint International Conference on Supercomputing in Nuclear Applications + Monte Carlo 2020 (SNA + MC 2020), p.236 - 242, 2020/10

The wind condition and the plume dispersion in urban areas are strongly affected by buildings and plants, which are hardly described in the conventional mesoscale simulations. To resolve this issue, we developed a GPU-based CFD code using a mesh-refined lattice Boltzmann method (LBM), which enables real-time plume dispersion simulations with a resolution of several meters. However, such high resolution simulations are highly turbulent and the time histories of the results are sensitive to various simulations conditions. In order to improve the reliability of such chaotic simulations, we developed an ensemble simulation approach, which enables a statistical estimation of the uncertainty. We examined the developed code against the field experiment JU2003 in Oklahoma City. In the comparison, the wind conditions showed good agreements, and the average values of the tracer gas concentration satisfied the factor 2 agreements between the ensemble simulation data and the experiment.

Journal Articles

GPU-acceleration of locally mesh allocated two phase flow solver for nuclear reactors

Onodera, Naoyuki; Idomura, Yasuhiro; Ali, Y.*; Yamashita, Susumu; Shimokawabe, Takashi*; Aoki, Takayuki*

Proceedings of Joint International Conference on Supercomputing in Nuclear Applications + Monte Carlo 2020 (SNA + MC 2020), p.210 - 215, 2020/10

This paper presents a GPU-based Poisson solver on a block-based adaptive mesh refinement (block-AMR) framework. The block-AMR method is essential for GPU computation and efficient description of the nuclear reactor. In this paper, we successfully implement a conjugate gradient method with a state-of-the-art multi-grid preconditioner (MG-CG) on the block-AMR framework. GPU kernel performance was measured on the GPU-based supercomputer TSUBAME3.0. The bandwidth of a vector-vector sum, a matrix-vector product, and a dot product in the CG kernel gave good performance at about 60% of the peak performance. In the MG kernel, the smoothers in a three-stage V-cycle MG method are implemented using a mixed precision RB-SOR method, which also gave good performance. For a large-scale Poisson problem with $$453.0 times 10^6$$ cells, the developed MG-CG method reduced the number of iterations to less than 30% and achieved $$times$$ 2.5 speedup compared with the original preconditioned CG method.

Journal Articles

Communication avoiding multigrid preconditioned conjugate gradient method for extreme scale multiphase CFD simulations

Idomura, Yasuhiro; Onodera, Naoyuki; Yamada, Susumu; Yamashita, Susumu; Ina, Takuya*; Imamura, Toshiyuki*

Supa Kompyuthingu Nyusu, 22(5), p.18 - 29, 2020/09

A communication avoiding multigrid preconditioned conjugate gradient method (CAMGCG) is applied to the pressure Poisson equation in a multiphase CFD code JUPITER, and its computational performance and convergence property are compared against the conventional Krylov methods. The CAMGCG solver has robust convergence properties regardless of the problem size, and shows both communication reduction and convergence improvement, leading to higher performance gain than CA Krylov solvers, which achieve only the former. The CAMGCG solver is applied to extreme scale multiphase CFD simulations with 90 billion DOFs, and its performance is compared against the preconditioned CG solver. In this benchmark, the number of iterations is reduced to $$sim 1/800$$, and $$sim 11.6times$$ speedup is achieved with keeping excellent strong scaling up to 8,000 nodes on the Oakforest-PACS.

Journal Articles

Ensemble wind simulation using a mesh-refined lattice Boltzmann method

Hasegawa, Yuta; Onodera, Naoyuki; Idomura, Yasuhiro

Dai-25-Kai Nippon Keisan Kogaku Koenkai Rombunshu (CD-ROM), 4 Pages, 2020/06

We developed a GPU-based CFD code using a mesh-refined lattice Boltzmann method (LBM), which enables ensemble simulations for wind and plume dispersion in urban cities. The code is tuned for Pascal or Volta GPU architectures, and is able to perform real-time wind simulations with several kilometers square region and several meters of grid resolution. We examined the developed code against the field experiment JU2003 in Oklahoma City. In the comparison, wind conditions showed good agreements, and the ensemble-averaged and maximum values of tracer concentration satisfied the factor 2 agreements.

Journal Articles

GPU-acceleration of locally mesh allocated Poisson solver

Onodera, Naoyuki; Idomura, Yasuhiro; Ali, Y.*; Shimokawabe, Takashi*; Aoki, Takayuki*

Dai-25-Kai Nippon Keisan Kogaku Koenkai Rombunshu (CD-ROM), 4 Pages, 2020/06

We have developed the stencil-based CFD code JUPITER for simulating three-dimensional multiphase flows. A GPU-accelerated Poisson solver based on the preconditioned conjugate gradient (P-CG) method with a multigrid preconditioner was developed for the JUPITER with block-structured AMR mesh. All Poisson kernels were implemented using CUDA, and the GPU kernel function is well tuned to achieve high performance on GPU supercomputers. The developed multigrid solver shows good convergence of about 1/7 compared with the original P-CG method, and $$times$$3 speed up is achieved with strong scaling test from 8 to 216 GPUs on TSUBAME 3.0.

Journal Articles

Locally mesh-refined lattice Boltzmann method for fuel debris air cooling analysis on GPU supercomputer

Onodera, Naoyuki; Idomura, Yasuhiro; Uesawa, Shinichiro; Yamashita, Susumu; Yoshida, Hiroyuki

Mechanical Engineering Journal (Internet), 7(3), p.19-00531_1 - 19-00531_10, 2020/06

A dry method is one of practical methods for decommissioning the TEPCO's Fukushima Daiichi Nuclear Power Station. Japan Atomic Energy Agency (JAEA) has been evaluating the air cooling performance of the fuel debris by using the JUPITER code based on an incompressible fluid model and the CityLBM code based on the lattice Boltzmann method (LBM). However, these codes were based on a uniform Cartesian grid system, and required large computational time and cost to capture complicated debris structures. We develop an adaptive mesh refinement (AMR) version of the CityLBM code on GPU based supercomputers and apply it to thermal-hydrodynamics problems. The proposed method is validated against free convective heat transfer experiments at JAEA. It is also shown that the AMR based CityLBM code on 4 NVIDIA TESLA V100GPUs gives 6.7x speedup of the time to solution compared with the JUPITER code on 36 Intel Xeon E5-2680v3 CPUs.

Journal Articles

Inner and outer-layer similarity of the turbulence intensity profile over a realistic urban geometry

Inagaki, Atsushi*; Wangsaputra, Y.*; Kanda, Manabu*; Y$"u$cel, M.*; Onodera, Naoyuki; Aoki, Takayuki*

SOLA (Scientific Online Letters on the Atmosphere) (Internet), 16, p.120 - 124, 2020/00

 Times Cited Count:0 Percentile:100(Meteorology & Atmospheric Sciences)

The similarity of the turbulence intensity profile with the inner-layer and the outer-layer scalings were examined for an urban boundary layer using numerical simulations. The simulations consider a developing neutral boundary layer over realistic building geometry. The computational domain covers an 19.2 km by 4.8 km and extends up to a height of 1 km with 2-m grids. Several turbulence intensity profiles are defined locally in the computational domain. The inner- and outer-layer scalings work well reducing the scatter of the turbulence intensity within the inner- and outer-layers, respectively, regardless of the surface geometry. Although the main scatters among the scaled profiles are attributed to the mismatch of the parts of the layer and the scaling parameters, their behaviors can also be explained by introducing a non-dimensional parameter which consists of the ratio of length or velocity.

Journal Articles

GPU acceleration of communication avoiding Chebyshev basis conjugate gradient solver for multiphase CFD simulations

Ali, Y.*; Onodera, Naoyuki; Idomura, Yasuhiro; Ina, Takuya*; Imamura, Toshiyuki*

Proceedings of 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA 2019), p.1 - 8, 2019/11

 Times Cited Count:3 Percentile:1.37

Iterative methods for solving large linear systems are common parts of computational fluid dynamics (CFD) codes. The Preconditioned Conjugate Gradient (P-CG) method is one of the most widely used iterative methods. However, in the P-CG method, global collective communication is a crucial bottleneck especially on accelerated computing platforms. To resolve this issue, communication avoiding (CA) variants of the P-CG method are becoming increasingly important. In this paper, the P-CG and Preconditioned Chebyshev Basis CA CG (P-CBCG) solvers in the multiphase CFD code JUPITER are ported to the latest V100 GPUs. All GPU kernels are highly optimized to achieve about 90% of the roofline performance, the block Jacobi preconditioner is re-designed to extract high computing power of GPUs, and the remaining bottleneck of halo data communication is avoided by overlapping communication and computation. The overall performance of the P-CG and P-CBCG solvers is determined by the competition between the CA properties of the global collective communication and the halo data communication, indicating an importance of the inter-node interconnect bandwidth per GPU. The developed GPU solvers are accelerated up to 2x compared with the former CPU solvers on KNLs, and excellent strong scaling is achieved up to 7,680 GPUs on the Summit.

Journal Articles

Development of a structured overset Navier-Stokes solver with a moving grid and full multigrid method

Ohashi, Kunihide*; Hino, Takanori*; Kobayashi, Hiroshi*; Onodera, Naoyuki; Sakamoto, Nobuaki*

Journal of Marine Science and Technology, 24(3), p.884 - 901, 2019/09

 Times Cited Count:2 Percentile:35.37(Engineering, Marine)

An unsteady Reynolds averaged Navier-Stokes solver with a structured overset grid method has been developed. Velocity pressure coupling is achieved using an artificial compressibility approach, spatial discretization is based on a FVM. Body motions are considered using the grid deformation technique and grid velocities in the convective term. The full multigrid (FMG) method is applied to obtain fast convergence. The cell flag on a coarse grid level is determined using the cell flag on a fine grid level. In the coarse and fine grid level calculations at the FMG stage, the data are interpolated until the finest grid level is achieved at an overset update interval. Then, the data are updated based on the overset relations at the finest grid level and then transferred to a coarser grid level. The computations for flows around a hull form, including an unsteady simulation with regular waves, are demonstrated.

Journal Articles

Fuel debris' air cooling analysis using a lattice Boltzmann method

Onodera, Naoyuki; Idomura, Yasuhiro; Kawamura, Takuma; Uesawa, Shinichiro; Yamashita, Susumu; Yoshida, Hiroyuki

Proceedings of 27th International Conference on Nuclear Engineering (ICONE-27) (Internet), 6 Pages, 2019/05

A dry method is one of practical methods for decommissioning the TEPCO's Fukushima Daiichi Nuclear Power Station. Japan Atomic Energy Agency (JAEA) has been evaluating the air cooling performance by using the JUPITER code. However, the JUPITER code requires a large computational cost to capture debris' structures. To accelerate such CFD analyses, we use the CityLBM code, which is based on the lattice Boltzmann method (LBM) and is highly optimized for GPUs. The CityLBM code is validated against free convective heat transfer experiments at JAEA, and the similar accuracy as the JUPITER code is confirmed regarding the prediction capability of heat transfer and the resulting temperature distributions. It is also shown that the elapse time of a CityLBM simulation on GPUs is reduced to 1/6 compared with that of the corresponding JUPITER simulation on CPUs with the same number of GPUs and CPUs. The results show that the LBM is promising for accelerating thermal convective simulations.

Journal Articles

Communication Reduced Multi-time-step Algorithm for Real-time Wind Simulation on GPU-based Supercomputers

Onodera, Naoyuki; Idomura, Yasuhiro; Ali, Y.*; Shimokawabe, Takashi*

Proceedings of 9th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA 2018) (Internet), p.9 - 16, 2018/11

 Times Cited Count:2 Percentile:14.22

We develop a communication reduced multi-time- step (CRMT) algorithm for a Lattice Boltzmann method (LBM) based on a block-structured adaptive mesh refinement (AMR). This algorithm is based on the temporal blocking method, and can improve computational efficiency by replacing a communication bottleneck with additional computation. The proposed method is implemented on an extreme scale airflow simulation code CityLBM, and its impact on the scalability is tested on GPU based supercomputers, TSUBAME and Reedbush. Thanks to the CRMT algorithm, the communication cost is reduced by $$sim 64%$$, and weak and strong scalings are improved up to $$sim 200$$ GPUs. The obtained performance indicates that real time airflow simulations for about 2km square area with the wind speed of $$5m/s$$ is feasible using 1m resolution.

Journal Articles

Communication avoiding multigrid preconditioned conjugate gradient method for extreme scale multiphase CFD simulations

Idomura, Yasuhiro; Ina, Takuya*; Yamashita, Susumu; Onodera, Naoyuki; Yamada, Susumu; Imamura, Toshiyuki*

Proceedings of 9th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA 2018) (Internet), p.17 - 24, 2018/11

 Times Cited Count:0 Percentile:100

A communication avoiding (CA) multigrid preconditioned conjugate gradient method (CAMGCG) is applied to the pressure Poisson equation in a multiphase CFD code JUPITER, and its computational performance and convergence property are compared against CA Krylov methods. In the JUPITER code, the CAMGCG solver has robust convergence properties regardless of the problem size, and shows both communication reduction and convergence improvement, leading to higher performance gain than CA Krylov solvers, which achieve only the former. The CAMGCG solver is applied to extreme scale multiphase CFD simulations with $$sim 90$$ billion DOFs, and it is shown that compared with a preconditioned CG solver, the number of iterations is reduced to $$sim 1/800$$, and $$sim 11.6times$$ speedup is achieved with keeping excellent strong scaling up to 8,000 nodes on the Oakforest-PACS.

Journal Articles

Acceleration of plume dispersion simulation using locally mesh-refined lattice Boltzmann method

Onodera, Naoyuki; Idomura, Yasuhiro

Proceedings of 26th International Conference on Nuclear Engineering (ICONE-26) (Internet), 7 Pages, 2018/07

A large-scale simulation of the environmental dynamics of radioactive substances is very important from the viewpoint of nuclear security. Recently, GPU has been emerging as one of high performance devices to realize a large-scale simulation with less power consumption. We design a plume dispersion simulation based on the AMR-based LBM. We measure the performance of the LBM code on the GPU-rich supercomputer TSUBAME 3.0 at Tokyo Tech. We achieved good weak scaling from 4 GPUs to 144 GPUs, and 30 times higher node performance with CPUs. The code is validated against a wind tunnel test which was released from the National Institute of Advanced Industrial Science and Technology (AIST). The computational grids are subdivided by the AMR method, and the total number of grid points is reduced to less than 10% compared to the finest meshes. In spite of the fewer grid points, the turbulent statistics and plume dispersion are in good agreement with the experiment data.

Journal Articles

Acceleration of wind simulation using locally mesh-refined Lattice Boltzmann Method on GPU-Rich supercomputers

Onodera, Naoyuki; Idomura, Yasuhiro

Lecture Notes in Computer Science 10776, p.128 - 145, 2018/00

 Times Cited Count:6 Percentile:4.45

We developed a CFD code based on the adaptive mesh-refined Lattice Boltzmann Method (AMR-LBM). The code is developed on the GPU-rich supercomputer TSUBAME3.0 at the Tokyo Tech, and the GPU kernel functions are tuned to achieve high performance on the Pascal GPU architecture. The performances of weak scaling from 1 nodes to 36 nodes are examined. The GPUs (NVIDIA TESLA P100) achieved more than 10 times higher node performance than that of CPUs (Broadwell).

Journal Articles

A Stencil framework to realize large-scale computations beyond device memory capacity on GPU supercomputers

Shimokawabe, Takashi*; Endo, Toshio*; Onodera, Naoyuki; Aoki, Takayuki*

Proceedings of 2017 IEEE International Conference on Cluster Computing (IEEE Cluster 2017) (Internet), p.525 - 529, 2017/09

Stencil-based applications such as CFD have succeeded in obtaining high performance on GPU supercomputers. The problem sizes of these applications are limited by the GPU device memory capacity, which is typically smaller than the host memory. On GPU supercomputers, a locality improvement technique using temporal blocking method with memory swapping between host and device enables large computation beyond the device memory capacity. Our high-productivity stencil framework automatically applies temporal blocking to boundary exchange required for stencil computation and supports automatic memory swapping provided by a MPI/CUDA wrapper library. The framework-based application for the airflow in an urban city maintains 80% performance even with the twice larger than the GPU memory capacity and have demonstrated good weak scalability on the TSUBAME 2.5 supercomputer.

Journal Articles

A Numerical study of turbulence statistics and the structure of a spatially-developing boundary layer over a realistic urban geometry

Inagaki, Atsushi*; Kanda, Manabu*; Ahmad, N. H.*; Yagi, Ayako*; Onodera, Naoyuki; Aoki, Takayuki*

Boundary-Layer Meteorology, 164(2), p.161 - 181, 2017/08

 Times Cited Count:5 Percentile:63.4(Meteorology & Atmospheric Sciences)

The applicability of outer-layer scaling is examined by numerical simulation of a developing neutral boundary layer over a realistic building geometry of Tokyo. Large-eddy simulations are carried out over a large computational domain 19.2 km $$times$$ 4.8 km $$times$$1 km, with a fine grid spacing (2 m) using the lattice-Boltzmann method with massively parallel graphics processing units. Results from simulations show that outer-layer features are maintained for turbulence statistics in the upper part of the boundary layer, as well as the width of predominant streaky structures throughout the entire boundary layer. This is caused by the existence of very large streaky structures extending throughout the entire boundary layer, which follow outer-layer scaling with a self-preserving development. We assume the top-down mechanism in the physical interpretation of results.

Oral presentation

Large eddy simulation of turbulent flow in fuel assemblies with GPU clusters

Onodera, Naoyuki; Yoshida, Hiroyuki; Takase, Kazuyuki

no journal, , 

no abstracts in English

Oral presentation

A Meso scale weather model and LES turbulent flow calculation in the GPU supercomputer TSUBAME2.0

Onodera, Naoyuki; Aoki, Takayuki*; Shimokawabe, Takashi*

no journal, , 

no abstracts in English

Oral presentation

Large-eddy simulation of turbulence in complex geometry with GPU clusters

Onodera, Naoyuki; Yoshida, Hiroyuki; Takase, Kazuyuki; Aoki, Takayuki*

no journal, , 

no abstracts in English

56 (Records 1-20 displayed on this page)