Refine your search:     
Report No.
 - 

Study on acceleration of locally mesh-refined lattice Boltzmann simulation using GPU interconnect technology

Hasegawa, Yuta   ; Onodera, Naoyuki   ; Idomura, Yasuhiro   

To reduce memory usage and accelerate data communication in the locally-refined lattice Boltzmann code, we tried an intra-node multi-GPU implementation using Unified Memory in CUDA. In the microbenchmark test with uniform mesh, we achieved 96.4% and 94.6% parallel efficiency on weak and strong scaling of a 3D diffusion problem, and 99.3% and 56.5% parallel efficiency on weak and strong scaling of a D3Q27 lattice Boltzmann problem, respectively. In the locally mesh-refined lattice Boltzmann code, the present method could reduce memory usage by 25.5% from the Flat MPI implementation. However, this code showed only 9.0% parallel efficiency on strong scaling, which was worse than that on the Flat MPI implementation.

Accesses

:

- Accesses

InCites™

:

Altmetrics

:

[CLARIVATE ANALYTICS], [WEB OF SCIENCE], [HIGHLY CITED PAPER & CUP LOGO] and [HOT PAPER & FIRE LOGO] are trademarks of Clarivate Analytics, and/or its affiliated company or companies, and used herein by permission and/or license.