※ 半角英数字
 年 ~ 

Tree cutting approach for domain partitioning on forest-of-octrees-based block-structured static adaptive mesh refinement with lattice Boltzmann method


長谷川 雄太   ; 青木 尊之*; 小林 宏充*; 井戸村 泰宏   ; 小野寺 直幸   

Hasegawa, Yuta; Aoki, Takayuki*; Kobayashi, Hiromichi*; Idomura, Yasuhiro; Onodera, Naoyuki

GPUスーパコンピュータに対して格子ボルツマン法(LBM: lattice Botltzmann method)およびforest-of-octreesに基づくブロック構造型の局所細分化格子(LMR: local mesh refinement)を用いた空力解析コードを実装し、その性能を評価した。性能評価の結果、従来の空間充填曲線(SFC; space-filling curve)に基づく領域分割アルゴリズムでは、本空力解析において袖領域通信のコストが過大となることがわかった。領域分割の改善手法として本稿では挿し木法を提案し、領域分割の局所性とトポロジーを改善し、従来のSFCに基づく手法に比べて通信コストを1/3$$sim$$1/4に削減した。強スケーリング測定では、最大で1.82倍の高速化を示し、128GPUで2207MLUPS(mega-lattice update per second)の性能を達成した。弱スケーリング測定では、8$$sim$$128GPUで93.4%の並列化効率を示し、最大規模の128GPU計算では44.73億格子点を用いて9620MLUPSの性能を達成した。

The aerodynamics simulation code based on the lattice Boltzmann method (LBM) using forest-of-octrees-based block-structured local mesh refinement (LMR) was implemented, and its performance was evaluated on GPU-based supercomputers. We found that the conventional Space-Filling-Curve-based (SFC) domain partitioning algorithm results in costly halo communication in our aerodynamics simulations. Our new tree cutting approach improved the locality and the topology of the partitioned sub-domains and reduced the communication cost to one-third or one-fourth of the original SFC approach. In the strong scaling test, the code achieved maximum $$times1.82$$ speedup at the performance of 2207 MLUPS (mega- lattice update per second) on 128 GPUs. In the weak scaling test, the code achieved 9620 MLUPS at 128 GPUs with 4.473 billion grid points, while the parallel efficiency was 93.4% from 8 to 128 GPUs.



- Accesses




分野:Computer Science, Theory & Methods



[CLARIVATE ANALYTICS], [WEB OF SCIENCE], [HIGHLY CITED PAPER & CUP LOGO] and [HOT PAPER & FIRE LOGO] are trademarks of Clarivate Analytics, and/or its affiliated company or companies, and used herein by permission and/or license.