Study on acceleration of locally mesh-refined lattice Boltzmann simulation using GPU interconnect technology

Hasegawa, Yuta ; Onodera, Naoyuki ; Idomura, Yasuhiro

To reduce memory usage and accelerate data communication in the locally-refined lattice Boltzmann code, we tried an intra-node multi-GPU implementation using Unified Memory in CUDA. In the microbenchmark test with uniform mesh, we achieved 96.4% and 94.6% parallel efficiency on weak and strong scaling of a 3D diffusion problem, and 99.3% and 56.5% parallel efficiency on weak and strong scaling of a D3Q27 lattice Boltzmann problem, respectively. In the locally mesh-refined lattice Boltzmann code, the present method could reduce memory usage by 25.5% from the Flat MPI implementation. However, this code showed only 9.0% parallel efficiency on strong scaling, which was worse than that on the Flat MPI implementation.

Language	:	Japanese
Journal	:
Volume	:
Number	:
Pages	:
Publication Year/Month	:
Meeting title	:	日本機械学会第32回計算力学講演会(CMD2019)
Held date	:	2019/09
Location (city)	:	川越
Location (country)	:	日本
Patent information	:
PDF	:
Paper URL	:
DOI for research data	:
Keywords	:	CUDA; Unified Memory; NVLink; Lattice Boltzmann Method; Locally Refined Mesh
Research Facility	:	大型計算機・スパコン(東海)
Press Release	:
Article of JAEA R&D Review	:
Cooperating Institute	:

Accesses	:	- Accesses
Web of Science® Times Cited Count	:	Times Cited Count： If you would like to get the latest times cited, please access the Web of Science®. http://www.webofknowledge.com/wos
InCites™	:
Altmetrics	:

Registration No. : BB20191173
JAEA Abstracts No. :
Paper Submission No. :

[CLARIVATE ANALYTICS], [WEB OF SCIENCE], [HIGHLY CITED PAPER & CUP LOGO] and [HOT PAPER & FIRE LOGO] are trademarks of Clarivate Analytics, and/or its affiliated company or companies, and used herein by permission and/or license.