GPU acceleration of communication avoiding Chebyshev basis conjugate gradient solver for multiphase CFD simulations

多相CFDシミュレーション用省通信チェビシェフ基底共役勾配法のGPU加速

Ali, Y.*; Onodera, Naoyuki; Idomura, Yasuhiro; Ina, Takuya*; Imamura, Toshiyuki*

大規模線形問題の反復法ソルバはCFDコードで共通に用いられる。前処理付共役勾配(P-CG)法は最も広く用いられている反復法の一つである。しかしながら、P-CG法では、特に演算加速環境において、大域的集団通信が重要なボトルネックとなる。この問題を解決するために、省通信版のP-CG法がますます重要になっている。本論文では多相CFDコードJUPITERにおけるP-CG法と前処理付チェビシェフ基底省通信CG(P-CBCG)法を最新のV100GPUに移植する。全てのGPUカーネルは高度に最適化され約90%のルーフライン性能を達成し、ブロックヤコビ前処理はGPUの高い演算性能を引き出すように再設計し、さらに残された袖通信のボトルネックは通信と計算のオーバーラップによって回避した。P-CG法とP-CBCG法の全体性能は大域的集団通信と袖通信の省通信特性によって左右され、GPUあたりのノード間通信帯域が重要となることが示された。開発したGPUソルバはKNLにおける以前のCPUソルバの2倍に加速され、Summitにおいて7,680GPUまで良好な強スケーリングを達成した。

Iterative methods for solving large linear systems are common parts of computational fluid dynamics (CFD) codes. The Preconditioned Conjugate Gradient (P-CG) method is one of the most widely used iterative methods. However, in the P-CG method, global collective communication is a crucial bottleneck especially on accelerated computing platforms. To resolve this issue, communication avoiding (CA) variants of the P-CG method are becoming increasingly important. In this paper, the P-CG and Preconditioned Chebyshev Basis CA CG (P-CBCG) solvers in the multiphase CFD code JUPITER are ported to the latest V100 GPUs. All GPU kernels are highly optimized to achieve about 90% of the roofline performance, the block Jacobi preconditioner is re-designed to extract high computing power of GPUs, and the remaining bottleneck of halo data communication is avoided by overlapping communication and computation. The overall performance of the P-CG and P-CBCG solvers is determined by the competition between the CA properties of the global collective communication and the halo data communication, indicating an importance of the inter-node interconnect bandwidth per GPU. The developed GPU solvers are accelerated up to 2x compared with the former CPU solvers on KNLs, and excellent strong scaling is achieved up to 7,680 GPUs on the Summit.

発表言語	:	English
掲載資料名	:	Proceedings of 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA 2019)
巻	:
号	:
ページ数	:	p.1 - 8
発行年月	:	2019/11
発表会議名	:	10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA 2019)
開催年月	:	2019/11
開催都市	:	Denver
開催国	:	U.S.A.
キーワード	:	Communication Avoiding Krylov subspace Method; CFD; GPU

特許データ	:
PDF	:

論文URL	:	https://doi.org/10.1109/ScalA49573.2019.00006
研究データの公開先DOI	:	本成果にかかわる研究データのリンクです。
使用施設	:	大型計算機・スパコン(東海)
広報プレスリリース	:
論文解説記事 (成果普及情報誌)	:	世界最大のGPUスーパーコンピュータを用いた多相流体解析; GPU向け省通信型行列解法の開発[/]
受委託・共同研究相手機関	:	文部科学省

Access	:	- Accesses
Web of Science® Times Cited Count	:	被引用回数：評価・統計等のため最新の被引用回数を確認したい場合は、直接Web of Science®をご確認ください。 http://www.webofknowledge.com/wos
InCites™	:	パーセンタイル：93.69 分野：Computer Science, Theory & Methods
Altmetrics	:

登録番号 : BB20191584
抄録集掲載番号 : 48000165
論文投稿番号 : 22516

[CLARIVATE ANALYTICS], [WEB OF SCIENCE], [HIGHLY CITED PAPER & CUP LOGO] and [HOT PAPER & FIRE LOGO] are trademarks of Clarivate Analytics, and/or its affiliated company or companies, and used herein by permission and/or license.