Refine your search:     
Report No.
 - 
Search Results: Records 1-20 displayed on this page of 23

Presentation/Publication Type

Initialising ...

Refine

Journal/Book Title

Initialising ...

Meeting title

Initialising ...

First Author

Initialising ...

Keyword

Initialising ...

Language

Initialising ...

Publication Year

Initialising ...

Held year of conference

Initialising ...

Save select records

Journal Articles

AMR-Net: Convolutional neural networks for multi-resolution steady flow prediction

Asahi, Yuichi; Hatayama, Sora*; Shimokawabe, Takashi*; Onodera, Naoyuki; Hasegawa, Yuta; Idomura, Yasuhiro

Proceedings of 2021 IEEE International Conference on Cluster Computing (IEEE Cluster 2021) (Internet), p.686 - 691, 2021/10

We develop a convolutional neural network model to predict the multi-resolution steady flow. Based on the state-of-the-art image-to-image translation model pix2pixHD, our model can predict the high resolution flow field from the set of patched signed distance functions. By patching the high resolution data, the memory requirements in our model is suppressed compared to pix2pixHD.

Journal Articles

Real-time tracer dispersion simulations in Oklahoma City using the locally mesh-refined lattice Boltzmann method

Onodera, Naoyuki; Idomura, Yasuhiro; Hasegawa, Yuta; Nakayama, Hiromasa; Shimokawabe, Takashi*; Aoki, Takayuki*

Boundary-Layer Meteorology, 179(2), p.187 - 208, 2021/05

 Times Cited Count:2 Percentile:87.32(Meteorology & Atmospheric Sciences)

A plume dispersion simulation code named CityLBM enables a real time simulation for several km by applying adaptive mesh refinement (AMR) method on GPU supercomputers. We assess plume dispersion problems in the complex urban environment of Oklahoma City (JU2003). Realistic mesoscale wind boundary conditions of JU2003 produced by a Weather Research and Forecasting Model (WRF), building structures, and a plant canopy model are introduced to CityLBM. Ensemble calculations are performed to reduce turbulence uncertainties. The statistics of the plume dispersion field, mean and max concentrations show that ensemble calculations improve the accuracy of the estimation, and the ensemble-averaged concentration values in the simulations over 4 km areas with 2-m resolution satisfied factor 2 agreements for 70% of 24 target measurement points and periods in JU2003.

Journal Articles

Acceleration of locally mesh allocated Poisson solver using mixed precision

Onodera, Naoyuki; Idomura, Yasuhiro; Hasegawa, Yuta; Shimokawabe, Takashi*; Aoki, Takayuki*

Keisan Kogaku Koenkai Rombunshu (CD-ROM), 26, 3 Pages, 2021/05

We develop a mixed-precision preconditioner for the pressure Poisson equation in a two-phase flow CFD code JUPITER-AMR. The multi-grid (MG) preconditioner is constructed based on the geometric MG method with a three- stage V-cycle, and a cache-reuse SOR (CR-SOR) method at each stage. The numerical experiments are conducted for two-phase flows in a fuel bundle of a nuclear reactor. The MG-CG solver in single-precision shows the same convergence histories as double-precision, which is about 75% of the computational time in double-precision. In the strong scaling test, the MG-CG solver in single-precision is accelerated by 1.88 times between 32 and 96 GPUs.

Journal Articles

Multi-resolution steady flow prediction with convolutional neural networks

Asahi, Yuichi; Hatayama, Sora*; Shimokawabe, Takashi*; Onodera, Naoyuki; Hasegawa, Yuta; Idomura, Yasuhiro

Keisan Kogaku Koenkai Rombunshu (CD-ROM), 26, 4 Pages, 2021/05

We develop a convolutional neural network model to predict the multi-resolution steady flow. Based on the state-of-the-art image-to-image translation model Pix2PixHD, our model can predict the high resolution flow field from the signed distance function. By patching the high resolution data, the memory requirements in our model is suppressed compared to Pix2PixHD.

Journal Articles

GPU acceleration of multigrid preconditioned conjugate gradient solver on block-structured Cartesian grid

Onodera, Naoyuki; Idomura, Yasuhiro; Hasegawa, Yuta; Yamashita, Susumu; Shimokawabe, Takashi*; Aoki, Takayuki*

Proceedings of International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2021) (Internet), p.120 - 128, 2021/01

 Times Cited Count:0 Percentile:0.01

We develop a multigrid preconditioned conjugate gradient (MG-CG) solver for the pressure Poisson equation in a two-phase flow CFD code JUPITER. The MG preconditioner is constructed based on the geometric MG method with a three-stage V-cycle, and a RB-SOR smoother and its variant with cache-reuse optimization (CR-SOR) are applied at each stage. The numerical experiments are conducted for two-phase flows in a fuel bundle of a nuclear reactor. The MG-CG solvers with the RB-SOR and CR-SOR smoothers reduce the number of iterations to less than 15% and 9% of the original preconditioned CG method, leading to 3.1- and 5.9-times speedups, respectively. The obtained performance indicates that the MG-CG solver designed for the block-structured grid is highly efficient and enables large-scale simulations of two-phase flows on GPU based supercomputers.

Journal Articles

Performance evaluation of block-structured Poisson solver on GPU, CPU, and ARM processors

Onodera, Naoyuki; Idomura, Yasuhiro; Asahi, Yuichi; Hasegawa, Yuta; Shimokawabe, Takashi*; Aoki, Takayuki*

Dai-34-Kai Suchi Ryutai Rikigaku Shimpojiumu Koen Rombunshu (Internet), 2 Pages, 2020/12

We develop a multigrid preconditioned conjugate gradient (MG-CG) solver for the pressure Poisson equation in a two-phase flow CFD code JUPITER. The code is written in C++ and CUDA to keep the portability on multi-platforms. The main kernels of the CG solver achieve reasonable performance as 0.4 $$sim$$ 0.75 of the roofline performances, and the performances of the MG-preconditioner are also reasonable on NVIDIA GPU and Intel CPU. However, the performance degradation of the SpMV kernel on ARM is significant. It is confirmed that the optimization does not work if any functions are included in the loop.

Journal Articles

GPU-acceleration of locally mesh allocated two phase flow solver for nuclear reactors

Onodera, Naoyuki; Idomura, Yasuhiro; Ali, Y.*; Yamashita, Susumu; Shimokawabe, Takashi*; Aoki, Takayuki*

Proceedings of Joint International Conference on Supercomputing in Nuclear Applications + Monte Carlo 2020 (SNA + MC 2020), p.210 - 215, 2020/10

This paper presents a GPU-based Poisson solver on a block-based adaptive mesh refinement (block-AMR) framework. The block-AMR method is essential for GPU computation and efficient description of the nuclear reactor. In this paper, we successfully implement a conjugate gradient method with a state-of-the-art multi-grid preconditioner (MG-CG) on the block-AMR framework. GPU kernel performance was measured on the GPU-based supercomputer TSUBAME3.0. The bandwidth of a vector-vector sum, a matrix-vector product, and a dot product in the CG kernel gave good performance at about 60% of the peak performance. In the MG kernel, the smoothers in a three-stage V-cycle MG method are implemented using a mixed precision RB-SOR method, which also gave good performance. For a large-scale Poisson problem with $$453.0 times 10^6$$ cells, the developed MG-CG method reduced the number of iterations to less than 30% and achieved $$times$$ 2.5 speedup compared with the original preconditioned CG method.

Journal Articles

GPU-acceleration of locally mesh allocated Poisson solver

Onodera, Naoyuki; Idomura, Yasuhiro; Ali, Y.*; Shimokawabe, Takashi*; Aoki, Takayuki*

Keisan Kogaku Koenkai Rombunshu (CD-ROM), 25, 4 Pages, 2020/06

We have developed the stencil-based CFD code JUPITER for simulating three-dimensional multiphase flows. A GPU-accelerated Poisson solver based on the preconditioned conjugate gradient (P-CG) method with a multigrid preconditioner was developed for the JUPITER with block-structured AMR mesh. All Poisson kernels were implemented using CUDA, and the GPU kernel function is well tuned to achieve high performance on GPU supercomputers. The developed multigrid solver shows good convergence of about 1/7 compared with the original P-CG method, and $$times$$3 speed up is achieved with strong scaling test from 8 to 216 GPUs on TSUBAME 3.0.

Journal Articles

Communication Reduced Multi-time-step Algorithm for Real-time Wind Simulation on GPU-based Supercomputers

Onodera, Naoyuki; Idomura, Yasuhiro; Ali, Y.*; Shimokawabe, Takashi*

Proceedings of 9th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA 2018) (Internet), p.9 - 16, 2018/11

 Times Cited Count:5 Percentile:94.81

We develop a communication reduced multi-time- step (CRMT) algorithm for a Lattice Boltzmann method (LBM) based on a block-structured adaptive mesh refinement (AMR). This algorithm is based on the temporal blocking method, and can improve computational efficiency by replacing a communication bottleneck with additional computation. The proposed method is implemented on an extreme scale airflow simulation code CityLBM, and its impact on the scalability is tested on GPU based supercomputers, TSUBAME and Reedbush. Thanks to the CRMT algorithm, the communication cost is reduced by $$sim 64%$$, and weak and strong scalings are improved up to $$sim 200$$ GPUs. The obtained performance indicates that real time airflow simulations for about 2km square area with the wind speed of $$5m/s$$ is feasible using 1m resolution.

Journal Articles

A Stencil framework to realize large-scale computations beyond device memory capacity on GPU supercomputers

Shimokawabe, Takashi*; Endo, Toshio*; Onodera, Naoyuki; Aoki, Takayuki*

Proceedings of 2017 IEEE International Conference on Cluster Computing (IEEE Cluster 2017) (Internet), p.525 - 529, 2017/09

Stencil-based applications such as CFD have succeeded in obtaining high performance on GPU supercomputers. The problem sizes of these applications are limited by the GPU device memory capacity, which is typically smaller than the host memory. On GPU supercomputers, a locality improvement technique using temporal blocking method with memory swapping between host and device enables large computation beyond the device memory capacity. Our high-productivity stencil framework automatically applies temporal blocking to boundary exchange required for stencil computation and supports automatic memory swapping provided by a MPI/CUDA wrapper library. The framework-based application for the airflow in an urban city maintains 80% performance even with the twice larger than the GPU memory capacity and have demonstrated good weak scalability on the TSUBAME 2.5 supercomputer.

Oral presentation

A Meso scale weather model and LES turbulent flow calculation in the GPU supercomputer TSUBAME2.0

Onodera, Naoyuki; Aoki, Takayuki*; Shimokawabe, Takashi*

no journal, , 

no abstracts in English

Oral presentation

An AMR framework for realizing effective high-resolution simulations on multiple GPUs

Shimokawabe, Takashi*; Aoki, Takayuki*; Onodera, Naoyuki

no journal, , 

Recently grid-based physical simulations with multiple GPUs require effective methods to adapt grid resolution to certain sensitive regions of simulations. In the GPU computation, an adaptive mesh refinement (AMR) method is one of the effective methods to compute certain local regions that demand higher accuracy with higher resolution. The AMR method on the GPU supercomputers is, however, complicated and it is necessary to apply various optimizations suitable for the GPU supercomputers in order to obtain high performance. To develop the applications using the AMR method on the GPU supercomputers effectively, we are developing a block-based AMR framework for grid-based applications written in C++ and CUDA. Programmers just write the stencil functions that update a grid point on Cartesian grid.

Oral presentation

Development of an AMR framework to realize effective high-resolution simulations on multiple GPUs

Shimokawabe, Takashi*; Onodera, Naoyuki

no journal, , 

Recently grid-based physical simulations with multiple GPUs require effective methods to adapt grid resolution to certain sensitive regions of simulations. In the GPU computation. An Adaptive Mesh Refinement (AMR) method is one of the effective methods to compute certain local regions that demand higher accuracy with higher resolution. We are developing a block-based AMR framework for stencil applications written in C++ and CUDA. Programmers just write the stencil functions that update a grid point on Cartesian grid. The framework executes these functions over a tree-based AMR data structure effectively. The framework supports multiple GPUs and provides C++ classes to exchange halo regions and migrate data between GPUs. In this paper, we describe the programming model and implementation of the AMR framework for multiple GPUs, and show the computation results of the compressive fluid calculation based on the proposed AMR framework.

Oral presentation

Locally mesh-refined lattice Boltzmann method for thermal convective flows

Onodera, Naoyuki; Idomura, Yasuhiro; Ali, Y.*; Shimokawabe, Takashi*

no journal, , 

A thermal flow analysis is one of important topics for decommissioning the TEPCO's Fukushima Daiichi Nuclear Power Station. Japan Atomic Energy Agency (JAEA) has been evaluating the air cooling performance of the fuel debris by using the JUPITER code, which is based on an incompressible fluid model on uniform Cartesian grids. However, the JUPITER code requires a large computational cost to capture complicated debris' structures at the actual scale. To accelerate such air cooling analyses, we use the CityLBM code, which is developed using a locally mesh refined lattice Boltzmann method (LBM) and is highly optimized for GPUs. The CityLBM code is validated against free convective heat transfer experiments at JAEA.

Oral presentation

Communication reduced multi-time-step algorithm for the AMR-based lattice Boltzmann method on GPU-rich supercomputers

Onodera, Naoyuki; Idomura, Yasuhiro; Ali, Y.*; Shimokawabe, Takashi*

no journal, , 

We have developed a communication reduced multi-time-step (CRMT) algorithm for the Post-K supercomputer, and measured the performance on the GPU-based supercomputers. This algorithm is based on the temporal blocking method, and can improve computational efficiency by replacing a communication bottleneck with additional computation. The proposed method is easily applied to the explicit time integration scheme, and is implemented on an extreme scale airflow simulation code CityLBM. We evaluate the performance of the CRMT algorithm on GPU based supercomputers, TSUBAME and Reedbush. Thanks to the CRMT algorithm, the communication cost is reduced by 64%, and weak and strong scaling are improved up to 200 GPUs. The obtained performance indicates that real time airflow simulations for about 2 km square area with the wind speed of 5m/s is feasible using 1m resolution. We conclude that the CRMT algorithm is indispensable for the AMR-LBM to realize a real time simulation on future exascale systems.

Oral presentation

Tracer dispersion simulation in Oklahoma City using locally mesh-refined Lattice Boltzmann Method

Onodera, Naoyuki; Idomura, Yasuhiro; Kawamura, Takuma; Nakayama, Hiromasa; Shimokawabe, Takashi*

no journal, , 

A plume dispersion simulation is very important for designing smart cities. Since a lot of tall buildings and complex structures make the air flow turbulent in urban cities, large-scale CFD simulations are needed. We develop a GPU-based CFD code based on a Lattice Boltzmann Method (LBM) with a block-based Adaptive Mesh Refinement (AMR) method. The code is tuned to achieve high performance on the Pascal and Volta GPU architectures. We conducted a tracer dispersion simulation in Oklahoma City. The computational boundary conditions are given from real building data and WRF analysis results, respectively. By executing this computation, wind conditions in the urban area and details of plume distribution were reproduced with good accuracy.

Oral presentation

Tracer dispersion simulation using locally-mesh refined lattice Boltzmann method based on observation data

Onodera, Naoyuki; Idomura, Yasuhiro; Kawamura, Takuma; Nakayama, Hiromasa; Shimokawabe, Takashi*; Aoki, Takayuki*

no journal, , 

The simulation for dispersion of radioactive substances attract high social interest, and it is required to satisfy both the speed and the accuracy. To perform a real-time simulation with high resolution mesh for the scale of human living area involving alleyways and buildings, it is required to develop simulation schemes which can fully utilize high computational performance. In this study, we introduced a nudging-based data assimilation method and a plant canopy model into the lattice Boltzmann method (LBM), and confirmed the accuracy of plume dispersion simulations for urban areas is improved.

Oral presentation

Steady flow prediction using convolutional neural networks with boundary exchange

Hatayama, Sora*; Shimokawabe, Takashi*; Onodera, Naoyuki

no journal, , 

Computational fluid dynamics (CFD) is widely used as a fluid analysis technique. However, these have a problem that the calculation cost is very expensive and the execution time for reaching a steady-state is long. To solve this problem, we use convolutional neural networks (CNN), which is one of the deep learning methods, to predict CFD results. In this research, we provide the method and implementation of steady flow prediction using CNN with boundary exchange to predict the CFD results in a large area.

Oral presentation

Multigrid Poisson solver for a block-structured adaptive mesh refinement method on CPU and GPU supercomputers

Onodera, Naoyuki; Idomura, Yasuhiro; Asahi, Yuichi; Hasegawa, Yuta; Shimokawabe, Takashi*; Aoki, Takayuki*

no journal, , 

This paper presents performance studies of a multigrid (MG) Poisson solver on a block-structured adaptive mesh refinement (block-AMR) method on CPU and GPU supercomputers. The block-AMR method is efficient solutions of the nuclear reactor which is composed of complicated structures. We implement a three-stage V-cycle MG method and the calculation is accelerated by using a mixed precision techniques. For a large-scale Poisson problem with $$4.53 times 10^8$$ cells, the developed MG-CG method reduced the number of iterations to less than 30% and achieved 2 times speedup compared with the original preconditioned CG method on the GPU-supercomputer TSUBAME. This kind of performance studies are useful for designing advanced preconditioners in terms of robustness, computational precision, thread parallelization, and cache size on each architecture.

Oral presentation

High-resolution simulations using an AMR framework on GPU supercomputers

Shimokawabe, Takashi*; Onodera, Naoyuki

no journal, , 

An adaptive mesh refinement (AMR) method is one of the effective methods to compute certain local regions that demand higher accuracy with higher resolution. To develop the applications adopting AMR effectively with maintaining high performance on multiple GPUs, we are developing a block-based AMR framework for stencil applications written in C++ and CUDA. The programmer simply describes a C++11 lambda that updates a grid point, which is applied to the entire grids with various resolution over a tree-based AMR data structure effectively. The framework-based application for compressible flow has demonstrated good weak scalability with 84% of the parallel efficiency on the TSUBAME3.0 GPU supercomputer at Tokyo Institute of Technology.

23 (Records 1-20 displayed on this page)