Refine your search:     
Report No.
 - 
Search Results: Records 1-20 displayed on this page of 73

Presentation/Publication Type

Initialising ...

Refine

Journal/Book Title

Initialising ...

Meeting title

Initialising ...

First Author

Initialising ...

Keyword

Initialising ...

Language

Initialising ...

Publication Year

Initialising ...

Held year of conference

Initialising ...

Save select records

Journal Articles

Development of a surface heat flux model for urban wind simulation using locally mesh-refined lattice Boltzmann method

Onodera, Naoyuki; Idomura, Yasuhiro; Hasegawa, Yuta; Nakayama, Hiromasa

Dai-35-Kai Suchi Ryutai Rikigaku Shimpojiumu Koen Rombunshu (Internet), 3 Pages, 2021/12

A detailed wind simulation is very important for designing smart cities. Since a lot of tall buildings and complex structures make the air flow turbulent in urban cities, large-scale CFD simulations are needed. We develop a GPU-based CFD code based on a Lattice Boltzmann Method (LBM) with a block-based Adaptive Mesh Refinement (AMR) method. In order to reproduce real wind conditions, the wind condition and ground temperature of the mesoscale weather forecasting model are given as boundary conditions. In this research, a surface heat flux model based on the Monin-Obukhov similarity theory was introduced to improve the calculation accuracy. We conducted a detailed wind simulation in Oklahoma City. By executing this computation, wind conditions in the urban area were reproduced with good accuracy.

Journal Articles

Tree cutting approach for domain partitioning on forest-of-octrees-based block-structured static adaptive mesh refinement with lattice Boltzmann method

Hasegawa, Yuta; Aoki, Takayuki*; Kobayashi, Hiromichi*; Idomura, Yasuhiro; Onodera, Naoyuki

Parallel Computing, 108, p.102851_1 - 102851_12, 2021/12

The aerodynamics simulation code based on the lattice Boltzmann method (LBM) using forest-of-octrees-based block-structured local mesh refinement (LMR) was implemented, and its performance was evaluated on GPU-based supercomputers. We found that the conventional Space-Filling-Curve-based (SFC) domain partitioning algorithm results in costly halo communication in our aerodynamics simulations. Our new tree cutting approach improved the locality and the topology of the partitioned sub-domains and reduced the communication cost to one-third or one-fourth of the original SFC approach. In the strong scaling test, the code achieved maximum $$times1.82$$ speedup at the performance of 2207 MLUPS (mega- lattice update per second) on 128 GPUs. In the weak scaling test, the code achieved 9620 MLUPS at 128 GPUs with 4.473 billion grid points, while the parallel efficiency was 93.4% from 8 to 128 GPUs.

Journal Articles

Coherent eddies transporting passive scalars through the plant canopy revealed by Large-Eddy simulations using the lattice Boltzmann method

Watanabe, Tsutomu*; Takagi, Marie*; Shimoyama, Ko*; Kawashima, Masayuki*; Onodera, Naoyuki; Inagaki, Atsushi*

Boundary-Layer Meteorology, 181(1), p.39 - 71, 2021/10

A double-distribution-function lattice Boltzmann model for large-eddy simulations of a passive scalar field is described within and above a plant canopy. For a top-down scalar, for which the plant canopy serves as a distributed sink, the flux of the scalar near the canopy top are predominantly determined by sweep motions originating far above the canopy. By contrast, scalar ejection events are induced by coherent eddies generated near the canopy top. In this paper, the generation of such eddies is triggered by the downward approach of massive sweep motions to existing wide regions of weak ejective motions from inside to above the canopy.

Journal Articles

AMR-Net: Convolutional neural networks for multi-resolution steady flow prediction

Asahi, Yuichi; Hatayama, Sora*; Shimokawabe, Takashi*; Onodera, Naoyuki; Hasegawa, Yuta; Idomura, Yasuhiro

Proceedings of 2021 IEEE International Conference on Cluster Computing (IEEE Cluster 2021) (Internet), p.686 - 691, 2021/10

We develop a convolutional neural network model to predict the multi-resolution steady flow. Based on the state-of-the-art image-to-image translation model pix2pixHD, our model can predict the high resolution flow field from the set of patched signed distance functions. By patching the high resolution data, the memory requirements in our model is suppressed compared to pix2pixHD.

Journal Articles

Real-time tracer dispersion simulations in Oklahoma City using the locally mesh-refined lattice Boltzmann method

Onodera, Naoyuki; Idomura, Yasuhiro; Hasegawa, Yuta; Nakayama, Hiromasa; Shimokawabe, Takashi*; Aoki, Takayuki*

Boundary-Layer Meteorology, 179(2), p.187 - 208, 2021/05

 Times Cited Count:2 Percentile:87.32(Meteorology & Atmospheric Sciences)

A plume dispersion simulation code named CityLBM enables a real time simulation for ~several km by applying adaptive mesh refinement (AMR) method on GPU supercomputers. We assess plume dispersion problems in the complex urban environment of Oklahoma City (JU2003). Realistic mesoscale wind boundary conditions of JU2003 produced by a Weather Research and Forecasting Model (WRF), building structures, and a plant canopy model are introduced to CityLBM. Ensemble calculations are performed to reduce turbulence uncertainties. The statistics of the plume dispersion field, mean and max concentrations show that ensemble calculations improve the accuracy of the estimation, and the ensemble-averaged concentration values in the simulations over 4 km areas with 2-m resolution satisfied factor 2 agreements for 70% of 24 target measurement points and periods in JU2003.

Journal Articles

Improved domain partitioning on tree-based mesh-refined lattice Boltzmann method

Hasegawa, Yuta; Aoki, Takayuki*; Kobayashi, Hiromichi*; Idomura, Yasuhiro; Onodera, Naoyuki

Keisan Kogaku Koenkai Rombunshu (CD-ROM), 26, 6 Pages, 2021/05

We introduce an improved domain partitioning method called "tree cutting approach" for the aerodynamics simulation code based on the lattice Boltzmann method (LBM) with the forest-of-octrees-based local mesh refinement (LMR). The conventional domain partitioning algorithm based on the space-filling curve (SFC), which is widely used in LMR, caused a costly halo data communication which became a bottleneck of our aerodynamics simulation on the GPU-based supercomputers. Our tree cutting approach adopts a hybrid domain partitioning with the coarse structured block decomposition and the SFC partitioning in each block. This hybrid approach improved the locality and the topology of the partitioned sub-domains and reduced the amount of the halo communication to one-third of the original SFC approach. The code achieved $$times 1.23$$ speedup on 8 GPUs, and achieved $$times 1.82$$ speedup at the performance of 2207 MLUPS (mega-lattice update per second) on 128 GPUs with strong scaling test.

Journal Articles

Acceleration of locally mesh allocated Poisson solver using mixed precision

Onodera, Naoyuki; Idomura, Yasuhiro; Hasegawa, Yuta; Shimokawabe, Takashi*; Aoki, Takayuki*

Keisan Kogaku Koenkai Rombunshu (CD-ROM), 26, 3 Pages, 2021/05

We develop a mixed-precision preconditioner for the pressure Poisson equation in a two-phase flow CFD code JUPITER-AMR. The multi-grid (MG) preconditioner is constructed based on the geometric MG method with a three- stage V-cycle, and a cache-reuse SOR (CR-SOR) method at each stage. The numerical experiments are conducted for two-phase flows in a fuel bundle of a nuclear reactor. The MG-CG solver in single-precision shows the same convergence histories as double-precision, which is about 75% of the computational time in double-precision. In the strong scaling test, the MG-CG solver in single-precision is accelerated by 1.88 times between 32 and 96 GPUs.

Journal Articles

Multi-resolution steady flow prediction with convolutional neural networks

Asahi, Yuichi; Hatayama, Sora*; Shimokawabe, Takashi*; Onodera, Naoyuki; Hasegawa, Yuta; Idomura, Yasuhiro

Keisan Kogaku Koenkai Rombunshu (CD-ROM), 26, 4 Pages, 2021/05

We develop a convolutional neural network model to predict the multi-resolution steady flow. Based on the state-of-the-art image-to-image translation model Pix2PixHD, our model can predict the high resolution flow field from the signed distance function. By patching the high resolution data, the memory requirements in our model is suppressed compared to Pix2PixHD.

Journal Articles

GPU acceleration of multigrid preconditioned conjugate gradient solver on block-structured Cartesian grid

Onodera, Naoyuki; Idomura, Yasuhiro; Hasegawa, Yuta; Yamashita, Susumu; Shimokawabe, Takashi*; Aoki, Takayuki*

Proceedings of International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2021) (Internet), p.120 - 128, 2021/01

 Times Cited Count:0 Percentile:0.01

We develop a multigrid preconditioned conjugate gradient (MG-CG) solver for the pressure Poisson equation in a two-phase flow CFD code JUPITER. The MG preconditioner is constructed based on the geometric MG method with a three-stage V-cycle, and a RB-SOR smoother and its variant with cache-reuse optimization (CR-SOR) are applied at each stage. The numerical experiments are conducted for two-phase flows in a fuel bundle of a nuclear reactor. The MG-CG solvers with the RB-SOR and CR-SOR smoothers reduce the number of iterations to less than 15% and 9% of the original preconditioned CG method, leading to 3.1- and 5.9-times speedups, respectively. The obtained performance indicates that the MG-CG solver designed for the block-structured grid is highly efficient and enables large-scale simulations of two-phase flows on GPU based supercomputers.

Journal Articles

Plume dispersion simulation based on ensemble simulation with lattice Boltzmann method

Hasegawa, Yuta; Onodera, Naoyuki; Idomura, Yasuhiro

Dai-34-Kai Suchi Ryutai Rikigaku Shimpojiumu Koen Rombunshu (Internet), 3 Pages, 2020/12

We developed a real-time ensemble simulation code for analyzing urban wind conditions and plume dispersion using a locally mesh-refined lattice Boltzmann method. We validated the developed code against the wind tunnel experiment by AIST, and against the field experiment JU2003 in Oklahoma City. In the case of the wind tunnel experiment, the wind condition showed a good agreement with the experiment, and 61.2% of the tracer gas concentration data observed on the ground satisfied the FACTOR2 condition, that is an accuracy criterion given by the environmental assessment guideline. In the case of the field experiment JU2003, the instantaneous wind speed showed a good agreement with the experiment, while the wind direction showed a difference up to 100$$^{circ}$$. The means of the tracer gas concentration satisfied the FACTOR2 condition at all observation interval. These results demonstrate that the developed code is accurate enough for the environmental assessment.

Journal Articles

Performance evaluation of block-structured Poisson solver on GPU, CPU, and ARM processors

Onodera, Naoyuki; Idomura, Yasuhiro; Asahi, Yuichi; Hasegawa, Yuta; Shimokawabe, Takashi*; Aoki, Takayuki*

Dai-34-Kai Suchi Ryutai Rikigaku Shimpojiumu Koen Rombunshu (Internet), 2 Pages, 2020/12

We develop a multigrid preconditioned conjugate gradient (MG-CG) solver for the pressure Poisson equation in a two-phase flow CFD code JUPITER. The code is written in C++ and CUDA to keep the portability on multi-platforms. The main kernels of the CG solver achieve reasonable performance as 0.4 $$sim$$ 0.75 of the roofline performances, and the performances of the MG-preconditioner are also reasonable on NVIDIA GPU and Intel CPU. However, the performance degradation of the SpMV kernel on ARM is significant. It is confirmed that the optimization does not work if any functions are included in the loop.

Journal Articles

Ensemble wind simulations using a mesh-refined lattice Boltzmann method on GPU-accelerated systems

Hasegawa, Yuta; Onodera, Naoyuki; Idomura, Yasuhiro

Proceedings of Joint International Conference on Supercomputing in Nuclear Applications + Monte Carlo 2020 (SNA + MC 2020), p.236 - 242, 2020/10

The wind condition and the plume dispersion in urban areas are strongly affected by buildings and plants, which are hardly described in the conventional mesoscale simulations. To resolve this issue, we developed a GPU-based CFD code using a mesh-refined lattice Boltzmann method (LBM), which enables real-time plume dispersion simulations with a resolution of several meters. However, such high resolution simulations are highly turbulent and the time histories of the results are sensitive to various simulations conditions. In order to improve the reliability of such chaotic simulations, we developed an ensemble simulation approach, which enables a statistical estimation of the uncertainty. We examined the developed code against the field experiment JU2003 in Oklahoma City. In the comparison, the wind conditions showed good agreements, and the average values of the tracer gas concentration satisfied the factor 2 agreements between the ensemble simulation data and the experiment.

Journal Articles

GPU-acceleration of locally mesh allocated two phase flow solver for nuclear reactors

Onodera, Naoyuki; Idomura, Yasuhiro; Ali, Y.*; Yamashita, Susumu; Shimokawabe, Takashi*; Aoki, Takayuki*

Proceedings of Joint International Conference on Supercomputing in Nuclear Applications + Monte Carlo 2020 (SNA + MC 2020), p.210 - 215, 2020/10

This paper presents a GPU-based Poisson solver on a block-based adaptive mesh refinement (block-AMR) framework. The block-AMR method is essential for GPU computation and efficient description of the nuclear reactor. In this paper, we successfully implement a conjugate gradient method with a state-of-the-art multi-grid preconditioner (MG-CG) on the block-AMR framework. GPU kernel performance was measured on the GPU-based supercomputer TSUBAME3.0. The bandwidth of a vector-vector sum, a matrix-vector product, and a dot product in the CG kernel gave good performance at about 60% of the peak performance. In the MG kernel, the smoothers in a three-stage V-cycle MG method are implemented using a mixed precision RB-SOR method, which also gave good performance. For a large-scale Poisson problem with $$453.0 times 10^6$$ cells, the developed MG-CG method reduced the number of iterations to less than 30% and achieved $$times$$ 2.5 speedup compared with the original preconditioned CG method.

Journal Articles

Communication avoiding multigrid preconditioned conjugate gradient method for extreme scale multiphase CFD simulations

Idomura, Yasuhiro; Onodera, Naoyuki; Yamada, Susumu; Yamashita, Susumu; Ina, Takuya*; Imamura, Toshiyuki*

Supa Kompyuteingu Nyusu, 22(5), p.18 - 29, 2020/09

A communication avoiding multigrid preconditioned conjugate gradient method (CAMGCG) is applied to the pressure Poisson equation in a multiphase CFD code JUPITER, and its computational performance and convergence property are compared against the conventional Krylov methods. The CAMGCG solver has robust convergence properties regardless of the problem size, and shows both communication reduction and convergence improvement, leading to higher performance gain than CA Krylov solvers, which achieve only the former. The CAMGCG solver is applied to extreme scale multiphase CFD simulations with 90 billion DOFs, and its performance is compared against the preconditioned CG solver. In this benchmark, the number of iterations is reduced to $$sim 1/800$$, and $$sim 11.6times$$ speedup is achieved with keeping excellent strong scaling up to 8,000 nodes on the Oakforest-PACS.

Journal Articles

Ensemble wind simulation using a mesh-refined lattice Boltzmann method

Hasegawa, Yuta; Onodera, Naoyuki; Idomura, Yasuhiro

Keisan Kogaku Koenkai Rombunshu (CD-ROM), 25, 4 Pages, 2020/06

We developed a GPU-based CFD code using a mesh-refined lattice Boltzmann method (LBM), which enables ensemble simulations for wind and plume dispersion in urban cities. The code is tuned for Pascal or Volta GPU architectures, and is able to perform real-time wind simulations with several kilometers square region and several meters of grid resolution. We examined the developed code against the field experiment JU2003 in Oklahoma City. In the comparison, wind conditions showed good agreements, and the ensemble-averaged and maximum values of tracer concentration satisfied the factor 2 agreements.

Journal Articles

GPU-acceleration of locally mesh allocated Poisson solver

Onodera, Naoyuki; Idomura, Yasuhiro; Ali, Y.*; Shimokawabe, Takashi*; Aoki, Takayuki*

Keisan Kogaku Koenkai Rombunshu (CD-ROM), 25, 4 Pages, 2020/06

We have developed the stencil-based CFD code JUPITER for simulating three-dimensional multiphase flows. A GPU-accelerated Poisson solver based on the preconditioned conjugate gradient (P-CG) method with a multigrid preconditioner was developed for the JUPITER with block-structured AMR mesh. All Poisson kernels were implemented using CUDA, and the GPU kernel function is well tuned to achieve high performance on GPU supercomputers. The developed multigrid solver shows good convergence of about 1/7 compared with the original P-CG method, and $$times$$3 speed up is achieved with strong scaling test from 8 to 216 GPUs on TSUBAME 3.0.

Journal Articles

Locally mesh-refined lattice Boltzmann method for fuel debris air cooling analysis on GPU supercomputer

Onodera, Naoyuki; Idomura, Yasuhiro; Uesawa, Shinichiro; Yamashita, Susumu; Yoshida, Hiroyuki

Mechanical Engineering Journal (Internet), 7(3), p.19-00531_1 - 19-00531_10, 2020/06

A dry method is one of practical methods for decommissioning the TEPCO's Fukushima Daiichi Nuclear Power Station. Japan Atomic Energy Agency (JAEA) has been evaluating the air cooling performance of the fuel debris by using the JUPITER code based on an incompressible fluid model and the CityLBM code based on the lattice Boltzmann method (LBM). However, these codes were based on a uniform Cartesian grid system, and required large computational time and cost to capture complicated debris structures. We develop an adaptive mesh refinement (AMR) version of the CityLBM code on GPU based supercomputers and apply it to thermal-hydrodynamics problems. The proposed method is validated against free convective heat transfer experiments at JAEA. It is also shown that the AMR based CityLBM code on 4 NVIDIA TESLA V100GPUs gives 6.7x speedup of the time to solution compared with the JUPITER code on 36 Intel Xeon E5-2680v3 CPUs.

Journal Articles

Simulation of Lagrangian pollutant in Jakarta urban district using Lattice Boltzmann method

Yokouchi, Hiroshi*; Inagaki, Atsushi*; Kanda, Manabu*; Onodera, Naoyuki

Doboku Gakkai Rombunshu, B1 (Suikogaku) (Internet), 76(2), p.I_253 - I_258, 2020/00

Hight-resolution pollutant model embedded into Lattice Boltzmann method (LBM) is constructed. We focuses on Particle pollutants. Flow field is calculated using D3Q27 model of LBM and particle is calculated by Lagrangian method. Using this model, we discuss the change in concentration distribution when there is a huge building (GARUDA) in Jakarta as a application. As a result, we can find the relation of differences in particle density and differences in flow velocity due to GARUDA. When the velocity in the case w/o GARUDA is faster than the other, particle velocity in the case w/o GARUDA is reduced. And also, we can find the velocity near the solid boundary is underestimated and the particle density is higher than theoretical value. However, this model is valid far away from the solid boundary.

Journal Articles

Inner and outer-layer similarity of the turbulence intensity profile over a realistic urban geometry

Inagaki, Atsushi*; Wangsaputra, Y.*; Kanda, Manabu*; Y$"u$cel, M.*; Onodera, Naoyuki; Aoki, Takayuki*

SOLA (Scientific Online Letters on the Atmosphere) (Internet), 16, p.120 - 124, 2020/00

 Times Cited Count:0 Percentile:0.01(Meteorology & Atmospheric Sciences)

The similarity of the turbulence intensity profile with the inner-layer and the outer-layer scalings were examined for an urban boundary layer using numerical simulations. The simulations consider a developing neutral boundary layer over realistic building geometry. The computational domain covers an 19.2 km by 4.8 km and extends up to a height of 1 km with 2-m grids. Several turbulence intensity profiles are defined locally in the computational domain. The inner- and outer-layer scalings work well reducing the scatter of the turbulence intensity within the inner- and outer-layers, respectively, regardless of the surface geometry. Although the main scatters among the scaled profiles are attributed to the mismatch of the parts of the layer and the scaling parameters, their behaviors can also be explained by introducing a non-dimensional parameter which consists of the ratio of length or velocity.

Journal Articles

GPU acceleration of communication avoiding Chebyshev basis conjugate gradient solver for multiphase CFD simulations

Ali, Y.*; Onodera, Naoyuki; Idomura, Yasuhiro; Ina, Takuya*; Imamura, Toshiyuki*

Proceedings of 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA 2019), p.1 - 8, 2019/11

 Times Cited Count:6 Percentile:99.17

Iterative methods for solving large linear systems are common parts of computational fluid dynamics (CFD) codes. The Preconditioned Conjugate Gradient (P-CG) method is one of the most widely used iterative methods. However, in the P-CG method, global collective communication is a crucial bottleneck especially on accelerated computing platforms. To resolve this issue, communication avoiding (CA) variants of the P-CG method are becoming increasingly important. In this paper, the P-CG and Preconditioned Chebyshev Basis CA CG (P-CBCG) solvers in the multiphase CFD code JUPITER are ported to the latest V100 GPUs. All GPU kernels are highly optimized to achieve about 90% of the roofline performance, the block Jacobi preconditioner is re-designed to extract high computing power of GPUs, and the remaining bottleneck of halo data communication is avoided by overlapping communication and computation. The overall performance of the P-CG and P-CBCG solvers is determined by the competition between the CA properties of the global collective communication and the halo data communication, indicating an importance of the inter-node interconnect bandwidth per GPU. The developed GPU solvers are accelerated up to 2x compared with the former CPU solvers on KNLs, and excellent strong scaling is achieved up to 7,680 GPUs on the Summit.

73 (Records 1-20 displayed on this page)