Refine your search:     
Report No.
 - 
Search Results: Records 1-20 displayed on this page of 68

Presentation/Publication Type

Initialising ...

Refine

Journal/Book Title

Initialising ...

Meeting title

Initialising ...

First Author

Initialising ...

Keyword

Initialising ...

Language

Initialising ...

Publication Year

Initialising ...

Held year of conference

Initialising ...

Save select records

Journal Articles

Tree cutting approach for domain partitioning on forest-of-octrees-based block-structured static adaptive mesh refinement with lattice Boltzmann method

Hasegawa, Yuta; Aoki, Takayuki*; Kobayashi, Hiromichi*; Idomura, Yasuhiro; Onodera, Naoyuki

Parallel Computing, 108, p.102851_1 - 102851_12, 2021/12

The aerodynamics simulation code based on the lattice Boltzmann method (LBM) using forest-of-octrees-based block-structured local mesh refinement (LMR) was implemented, and its performance was evaluated on GPU-based supercomputers. We found that the conventional Space-Filling-Curve-based (SFC) domain partitioning algorithm results in costly halo communication in our aerodynamics simulations. Our new tree cutting approach improved the locality and the topology of the partitioned sub-domains and reduced the communication cost to one-third or one-fourth of the original SFC approach. In the strong scaling test, the code achieved maximum $$times1.82$$ speedup at the performance of 2207 MLUPS (mega- lattice update per second) on 128 GPUs. In the weak scaling test, the code achieved 9620 MLUPS at 128 GPUs with 4.473 billion grid points, while the parallel efficiency was 93.4% from 8 to 128 GPUs.

Journal Articles

Real-time tracer dispersion simulations in Oklahoma City using the locally mesh-refined lattice Boltzmann method

Onodera, Naoyuki; Idomura, Yasuhiro; Hasegawa, Yuta; Nakayama, Hiromasa; Shimokawabe, Takashi*; Aoki, Takayuki*

Boundary-Layer Meteorology, 179(2), p.187 - 208, 2021/05

 Times Cited Count:2 Percentile:87.32(Meteorology & Atmospheric Sciences)

A plume dispersion simulation code named CityLBM enables a real time simulation for ~several km by applying adaptive mesh refinement (AMR) method on GPU supercomputers. We assess plume dispersion problems in the complex urban environment of Oklahoma City (JU2003). Realistic mesoscale wind boundary conditions of JU2003 produced by a Weather Research and Forecasting Model (WRF), building structures, and a plant canopy model are introduced to CityLBM. Ensemble calculations are performed to reduce turbulence uncertainties. The statistics of the plume dispersion field, mean and max concentrations show that ensemble calculations improve the accuracy of the estimation, and the ensemble-averaged concentration values in the simulations over 4 km areas with 2-m resolution satisfied factor 2 agreements for 70% of 24 target measurement points and periods in JU2003.

Journal Articles

Improved domain partitioning on tree-based mesh-refined lattice Boltzmann method

Hasegawa, Yuta; Aoki, Takayuki*; Kobayashi, Hiromichi*; Idomura, Yasuhiro; Onodera, Naoyuki

Keisan Kogaku Koenkai Rombunshu (CD-ROM), 26, 6 Pages, 2021/05

We introduce an improved domain partitioning method called "tree cutting approach" for the aerodynamics simulation code based on the lattice Boltzmann method (LBM) with the forest-of-octrees-based local mesh refinement (LMR). The conventional domain partitioning algorithm based on the space-filling curve (SFC), which is widely used in LMR, caused a costly halo data communication which became a bottleneck of our aerodynamics simulation on the GPU-based supercomputers. Our tree cutting approach adopts a hybrid domain partitioning with the coarse structured block decomposition and the SFC partitioning in each block. This hybrid approach improved the locality and the topology of the partitioned sub-domains and reduced the amount of the halo communication to one-third of the original SFC approach. The code achieved $$times 1.23$$ speedup on 8 GPUs, and achieved $$times 1.82$$ speedup at the performance of 2207 MLUPS (mega-lattice update per second) on 128 GPUs with strong scaling test.

Journal Articles

Acceleration of locally mesh allocated Poisson solver using mixed precision

Onodera, Naoyuki; Idomura, Yasuhiro; Hasegawa, Yuta; Shimokawabe, Takashi*; Aoki, Takayuki*

Keisan Kogaku Koenkai Rombunshu (CD-ROM), 26, 3 Pages, 2021/05

We develop a mixed-precision preconditioner for the pressure Poisson equation in a two-phase flow CFD code JUPITER-AMR. The multi-grid (MG) preconditioner is constructed based on the geometric MG method with a three- stage V-cycle, and a cache-reuse SOR (CR-SOR) method at each stage. The numerical experiments are conducted for two-phase flows in a fuel bundle of a nuclear reactor. The MG-CG solver in single-precision shows the same convergence histories as double-precision, which is about 75% of the computational time in double-precision. In the strong scaling test, the MG-CG solver in single-precision is accelerated by 1.88 times between 32 and 96 GPUs.

Journal Articles

GPU acceleration of multigrid preconditioned conjugate gradient solver on block-structured Cartesian grid

Onodera, Naoyuki; Idomura, Yasuhiro; Hasegawa, Yuta; Yamashita, Susumu; Shimokawabe, Takashi*; Aoki, Takayuki*

Proceedings of International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2021) (Internet), p.120 - 128, 2021/01

 Times Cited Count:0 Percentile:0.01

We develop a multigrid preconditioned conjugate gradient (MG-CG) solver for the pressure Poisson equation in a two-phase flow CFD code JUPITER. The MG preconditioner is constructed based on the geometric MG method with a three-stage V-cycle, and a RB-SOR smoother and its variant with cache-reuse optimization (CR-SOR) are applied at each stage. The numerical experiments are conducted for two-phase flows in a fuel bundle of a nuclear reactor. The MG-CG solvers with the RB-SOR and CR-SOR smoothers reduce the number of iterations to less than 15% and 9% of the original preconditioned CG method, leading to 3.1- and 5.9-times speedups, respectively. The obtained performance indicates that the MG-CG solver designed for the block-structured grid is highly efficient and enables large-scale simulations of two-phase flows on GPU based supercomputers.

Journal Articles

Performance evaluation of block-structured Poisson solver on GPU, CPU, and ARM processors

Onodera, Naoyuki; Idomura, Yasuhiro; Asahi, Yuichi; Hasegawa, Yuta; Shimokawabe, Takashi*; Aoki, Takayuki*

Dai-34-Kai Suchi Ryutai Rikigaku Shimpojiumu Koen Rombunshu (Internet), 2 Pages, 2020/12

We develop a multigrid preconditioned conjugate gradient (MG-CG) solver for the pressure Poisson equation in a two-phase flow CFD code JUPITER. The code is written in C++ and CUDA to keep the portability on multi-platforms. The main kernels of the CG solver achieve reasonable performance as 0.4 $$sim$$ 0.75 of the roofline performances, and the performances of the MG-preconditioner are also reasonable on NVIDIA GPU and Intel CPU. However, the performance degradation of the SpMV kernel on ARM is significant. It is confirmed that the optimization does not work if any functions are included in the loop.

Journal Articles

GPU-acceleration of locally mesh allocated two phase flow solver for nuclear reactors

Onodera, Naoyuki; Idomura, Yasuhiro; Ali, Y.*; Yamashita, Susumu; Shimokawabe, Takashi*; Aoki, Takayuki*

Proceedings of Joint International Conference on Supercomputing in Nuclear Applications + Monte Carlo 2020 (SNA + MC 2020), p.210 - 215, 2020/10

This paper presents a GPU-based Poisson solver on a block-based adaptive mesh refinement (block-AMR) framework. The block-AMR method is essential for GPU computation and efficient description of the nuclear reactor. In this paper, we successfully implement a conjugate gradient method with a state-of-the-art multi-grid preconditioner (MG-CG) on the block-AMR framework. GPU kernel performance was measured on the GPU-based supercomputer TSUBAME3.0. The bandwidth of a vector-vector sum, a matrix-vector product, and a dot product in the CG kernel gave good performance at about 60% of the peak performance. In the MG kernel, the smoothers in a three-stage V-cycle MG method are implemented using a mixed precision RB-SOR method, which also gave good performance. For a large-scale Poisson problem with $$453.0 times 10^6$$ cells, the developed MG-CG method reduced the number of iterations to less than 30% and achieved $$times$$ 2.5 speedup compared with the original preconditioned CG method.

Journal Articles

GPU-acceleration of locally mesh allocated Poisson solver

Onodera, Naoyuki; Idomura, Yasuhiro; Ali, Y.*; Shimokawabe, Takashi*; Aoki, Takayuki*

Keisan Kogaku Koenkai Rombunshu (CD-ROM), 25, 4 Pages, 2020/06

We have developed the stencil-based CFD code JUPITER for simulating three-dimensional multiphase flows. A GPU-accelerated Poisson solver based on the preconditioned conjugate gradient (P-CG) method with a multigrid preconditioner was developed for the JUPITER with block-structured AMR mesh. All Poisson kernels were implemented using CUDA, and the GPU kernel function is well tuned to achieve high performance on GPU supercomputers. The developed multigrid solver shows good convergence of about 1/7 compared with the original P-CG method, and $$times$$3 speed up is achieved with strong scaling test from 8 to 216 GPUs on TSUBAME 3.0.

Journal Articles

A Large-scale aerodynamics study on bicycle racing

Aoki, Takayuki*; Hasegawa, Yuta

Jidosha Gijutsu, 74(4), p.18 - 23, 2020/04

Aerodynamics studies for bicycle racings have been carried out by using a CFD simulation based on LES model. For running of alone cyclist and 2-4 cyclists groups, the computational drags are in good agreement with the wind-tunnel experiments. Different shapes of group running and competing two teams are studied. A large-scale computation for a group of 72 cyclists has been performed by using 2.23 billion meshes on a GPU supercomputer.

Journal Articles

Inner and outer-layer similarity of the turbulence intensity profile over a realistic urban geometry

Inagaki, Atsushi*; Wangsaputra, Y.*; Kanda, Manabu*; Y$"u$cel, M.*; Onodera, Naoyuki; Aoki, Takayuki*

SOLA (Scientific Online Letters on the Atmosphere) (Internet), 16, p.120 - 124, 2020/00

 Times Cited Count:0 Percentile:0.01(Meteorology & Atmospheric Sciences)

The similarity of the turbulence intensity profile with the inner-layer and the outer-layer scalings were examined for an urban boundary layer using numerical simulations. The simulations consider a developing neutral boundary layer over realistic building geometry. The computational domain covers an 19.2 km by 4.8 km and extends up to a height of 1 km with 2-m grids. Several turbulence intensity profiles are defined locally in the computational domain. The inner- and outer-layer scalings work well reducing the scatter of the turbulence intensity within the inner- and outer-layers, respectively, regardless of the surface geometry. Although the main scatters among the scaled profiles are attributed to the mismatch of the parts of the layer and the scaling parameters, their behaviors can also be explained by introducing a non-dimensional parameter which consists of the ratio of length or velocity.

Journal Articles

Highlight of recent sample environment at J-PARC MLF

Kawamura, Seiko; Hattori, Takanori; Harjo, S.; Ikeda, Kazutaka*; Miyata, Noboru*; Miyazaki, Tsukasa*; Aoki, Hiroyuki; Watanabe, Masao; Sakaguchi, Yoshifumi*; Oku, Takayuki

Neutron News, 30(1), p.11 - 13, 2019/05

In Japanese neutron scattering facilities, some SE equipment that are frequently used at an instrument, such as the closed-cycle refrigerator (CCR), have been prepared for the instrument as standard SE. They are operated for user experiments by the instrument group. The advantage of this practice is that they can optimize the design of the SE for the instrument and can directly respond to users' requests. On the other hand, the SE team in the Materials and Life Science Experimental Facility (MLF) in J-PARC has managed commonly used SE to allow neutron experiments with more advanced SE. In this report, recent SE in the MLF is introduced. Highlighted are the SE in BL11, BL19, BL21 and BL17 and other SE recently progressed by the SE team.

Journal Articles

A Stencil framework to realize large-scale computations beyond device memory capacity on GPU supercomputers

Shimokawabe, Takashi*; Endo, Toshio*; Onodera, Naoyuki; Aoki, Takayuki*

Proceedings of 2017 IEEE International Conference on Cluster Computing (IEEE Cluster 2017) (Internet), p.525 - 529, 2017/09

Stencil-based applications such as CFD have succeeded in obtaining high performance on GPU supercomputers. The problem sizes of these applications are limited by the GPU device memory capacity, which is typically smaller than the host memory. On GPU supercomputers, a locality improvement technique using temporal blocking method with memory swapping between host and device enables large computation beyond the device memory capacity. Our high-productivity stencil framework automatically applies temporal blocking to boundary exchange required for stencil computation and supports automatic memory swapping provided by a MPI/CUDA wrapper library. The framework-based application for the airflow in an urban city maintains 80% performance even with the twice larger than the GPU memory capacity and have demonstrated good weak scalability on the TSUBAME 2.5 supercomputer.

Journal Articles

A Numerical study of turbulence statistics and the structure of a spatially-developing boundary layer over a realistic urban geometry

Inagaki, Atsushi*; Kanda, Manabu*; Ahmad, N. H.*; Yagi, Ayako*; Onodera, Naoyuki; Aoki, Takayuki*

Boundary-Layer Meteorology, 164(2), p.161 - 181, 2017/08

 Times Cited Count:12 Percentile:53.98(Meteorology & Atmospheric Sciences)

The applicability of outer-layer scaling is examined by numerical simulation of a developing neutral boundary layer over a realistic building geometry of Tokyo. Large-eddy simulations are carried out over a large computational domain 19.2 km $$times$$ 4.8 km $$times$$1 km, with a fine grid spacing (2 m) using the lattice-Boltzmann method with massively parallel graphics processing units. Results from simulations show that outer-layer features are maintained for turbulence statistics in the upper part of the boundary layer, as well as the width of predominant streaky structures throughout the entire boundary layer. This is caused by the existence of very large streaky structures extending throughout the entire boundary layer, which follow outer-layer scaling with a self-preserving development. We assume the top-down mechanism in the physical interpretation of results.

Journal Articles

Current status of electrostatic accelerator at TIARA

Usui, Aya; Chiba, Atsuya; Yamada, Keisuke; Yokoyama, Akihito; Kitano, Toshihiko*; Takayama, Terumitsu*; Orimo, Takao*; Kanai, Shinji*; Aoki, Yuki*; Hashizume, Masashi*; et al.

Dai-28-Kai Tandemu Kasokuki Oyobi Sono Shuhen Gijutsu No Kenkyukai Hokokushu, p.117 - 119, 2015/12

no abstracts in English

Journal Articles

Current status of electrostatic accelerators at TIARA

Usui, Aya; Uno, Sadanori; Chiba, Atsuya; Yamada, Keisuke; Yokoyama, Akihito; Kitano, Toshihiko*; Takayama, Terumitsu*; Orimo, Takao*; Kanai, Shinji*; Aoki, Yuki*; et al.

Dai-27-Kai Tandemu Kasokuki Oyobi Sono Shuhen Gijutsu No Kenkyukai Hokokushu, p.118 - 121, 2015/03

no abstracts in English

Journal Articles

Operation of electrostatic accelerators

Uno, Sadanori; Chiba, Atsuya; Yamada, Keisuke; Yokoyama, Akihito; Usui, Aya; Saito, Yuichi; Ishii, Yasuyuki; Sato, Takahiro; Okubo, Takeru; Nara, Takayuki; et al.

JAEA-Review 2013-059, JAEA Takasaki Annual Report 2012, P. 179, 2014/03

Three electrostatic accelerators at TIARA were operated on schedule in fiscal year 2012 except changing its schedule by cancellations of users. The yearly operation time of the 3 MV tandem accelerator, the 400 kV ion implanter and the 3MV single-ended accelerator were in the same levels as the ordinary one, whose operation time totaled to 2,073, 1,847 and 2,389 hours, respectively. The tandem accelerator had no trouble, whereas the ion implanter and the single-ended accelerator stopped by any troubles for one day and four days, respectively. The molecular ion beam of helium hydride was generated by the ion implanter, because the users required irradiation of several cluster ions in order to study the effect of irradiation. As a result, its intensity of beam was 50 nA at 200 kV. The ion beam of tungsten (W) at 15 MeV was accelerated by the tandem accelerator, whose intensity was 20 nA at charge state of 4+, because of the request from a researcher in the field of nuclear fusion.

Journal Articles

Current status of electrostatic accelerators at TIARA

Uno, Sadanori; Chiba, Atsuya; Yamada, Keisuke; Yokoyama, Akihito; Usui, Aya; Kitano, Toshihiko*; Takayama, Terumitsu*; Orimo, Takao*; Kanai, Shinji*; Aoki, Yuki*; et al.

Dai-26-Kai Tandemu Kasokuki Oyobi Sono Shuhen Gijutsu No Kenkyukai Hokokushu, p.79 - 81, 2013/07

Three electrostatic accelerators at TIARA were operated on schedule in fiscal year 2012 except changing its schedule by cancellations of users. The yearly operation time of the 3MV tandem accelerator, the 400 kV ion implanter and the 3 MV single-ended accelerator were in the same levels as the ordinary one, whose operation time totaled to 2,073, 1,847 and 2,389 hours, respectively. The tandem accelerator had no trouble, whereas the ion implanter and the single-ended accelerator stopped by any troubles for one day and four days, respectively. The ion implanter generated molecular ion beam of helium hydride by using the Freeman type ion source, because of the request from the user. As a result, its intensity of beam was 50 nA at 200 kV.

Journal Articles

Operation of electrostatic accelerators

Uno, Sadanori; Chiba, Atsuya; Yamada, Keisuke; Yokoyama, Akihito; Saito, Yuichi; Ishii, Yasuyuki; Sato, Takahiro; Okubo, Takeru; Nara, Takayuki; Kitano, Toshihiko*; et al.

JAEA-Review 2012-046, JAEA Takasaki Annual Report 2011, P. 173, 2013/01

The three electrostatic accelerators at the TIARA had no mechanical damage when the Tohoku earthquake happened on March 11, 2011. But, they could not be operated during April, due to the influence of planned power outage by TEPCO. These accelerators additionally operated on Saturday for ten days in order to compensate for the lost experiment time. As a result, the yearly operation time had kept the same level as the ordinary one. The ion beam of erbium (Er) with 11.7MeV was accelerated newly by the tandem accelerator, whose intensity was 20nA at charge state of 3+. The sequential generation/irradiation of two different kinds of fullerene ions was achieved at the ion implanter by a mixed powder method without exchange of the Freeman type ion source by the user's request.

Journal Articles

Operation of electrostatic accelerators at TIARA

Uno, Sadanori; Chiba, Atsuya; Yamada, Keisuke; Yokoyama, Akihito; Saito, Yuichi; Ishii, Yasuyuki; Sato, Takahiro; Okubo, Takeru; Nara, Takayuki; Kitano, Toshihiko*; et al.

Dai-7-Kai Takasaki Oyo Kenkyu Shimpojiumu Yoshishu, P. 119, 2012/10

The three electrostatic accelerators at TIARA had no damage when the Tohoku earthquake happened on March 11, 2011. But, they could not be operated until end of April, due to the influence of planned power outage and keep out into the controlled area for radiation. These accelerators additionally operated on Saturday for twelve days in order to compensate for the lost experiment time. Therefore, the yearly operation time had kept the same level as the ordinary one. The tandem accelerator has stopped leakage of the SF$$_{6}$$ gas from the base flange on the tank by the Viton gasket of rectangular cross section at the new type.

Journal Articles

Current status of electrostatic accelerators at TIARA

Uno, Sadanori; Chiba, Atsuya; Yamada, Keisuke; Yokoyama, Akihito; Kitano, Toshihiko*; Takayama, Terumitsu*; Orimo, Takao*; Kanai, Shinji*; Aoki, Yuki*; Yamada, Naoto*; et al.

Dai-25-Kai Tandemu Kasokuki Oyobi Sono Shuhen Gijutsu No Kenkyukai Hokokushu, p.64 - 66, 2012/07

The three electrostatic accelerators at the TIARA had no damage when the Tohoku earthquake happened on March 11, 2011. But, they could not be operated until end of April, due to the influence of planned power outage and keep out into the controlled area for radiation. These accelerators additionally operated on Saturday for twelve days in order to compensate for the lost experiment time. Therefore, the yearly operation time had kept the same level as the ordinary one. The tandem accelerator has stopped leakage of the SF$$_{6}$$ gas from the base flange on the tank by the Viton gasket of rectangular cross section at the new type. The ion implanter could generate two kinds of fullerene ions by a mixed material of ions and a controlled temperature of the oven without exchange of ion source.

68 (Records 1-20 displayed on this page)