局所細分化格子ボルツマン法におけるGPU間相互接続技術を活用した高速化手法の検討

Study on acceleration of locally mesh-refined lattice Boltzmann simulation using GPU interconnect technology

Hasegawa, Yuta; Onodera, Naoyuki; Idomura, Yasuhiro

局所細分化格子ボルツマン法の計算コードにおけるメモリ使用量削減および通信高速化を目的として、CUDAのUnified memoryを用いたノード内複数GPU実装を試行した。等間隔格子を用いたマイクロベンチマークテストでは、3次元拡散方程式において弱スケーリング96.4%および強スケーリング94.6%の並列化効率、ならびに、D3Q27格子ボルツマン法において弱スケーリング99.3%および強スケーリング56.5%の並列化効率を得た。局所細分化格子ボルツマン法においては、Flat MPI実装に比べてメモリ使用量を25.5%削減したが、並列化効率が9.0%と極めて低くなった。

To reduce memory usage and accelerate data communication in the locally-refined lattice Boltzmann code, we tried an intra-node multi-GPU implementation using Unified Memory in CUDA. In the microbenchmark test with uniform mesh, we achieved 96.4% and 94.6% parallel efficiency on weak and strong scaling of a 3D diffusion problem, and 99.3% and 56.5% parallel efficiency on weak and strong scaling of a D3Q27 lattice Boltzmann problem, respectively. In the locally mesh-refined lattice Boltzmann code, the present method could reduce memory usage by 25.5% from the Flat MPI implementation. However, this code showed only 9.0% parallel efficiency on strong scaling, which was worse than that on the Flat MPI implementation.

発表言語	:	Japanese
掲載資料名	:
巻	:
号	:
ページ数	:
発行年月	:
発表会議名	:	日本機械学会第32回計算力学講演会(CMD2019)
開催年月	:	2019/09
開催都市	:	川越
開催国	:	日本
キーワード	:	CUDA; Unified Memory; NVLink; Lattice Boltzmann Method; Locally Refined Mesh

特許データ	:
PDF	:

論文URL	:
研究データの公開先DOI	:	本成果にかかわる研究データのリンクです。
使用施設	:	大型計算機・スパコン(東海)
広報プレスリリース	:
論文解説 (JAEA R&D Navigator)	:
受委託・共同研究相手機関	:

Access	:	- Accesses
Web of Science® Times Cited Count	:	被引用回数：評価・統計等のため最新の被引用回数を確認したい場合は、直接Web of Science®をご確認ください。 http://www.webofknowledge.com/wos
InCites™	:
Altmetrics	:

登録番号 : BB20191173
抄録集掲載番号 :
論文投稿番号 :

[CLARIVATE ANALYTICS], [WEB OF SCIENCE], [HIGHLY CITED PAPER & CUP LOGO] and [HOT PAPER & FIRE LOGO] are trademarks of Clarivate Analytics, and/or its affiliated company or companies, and used herein by permission and/or license.