Communication reduced multi-time-step algorithm for the AMR-based lattice Boltzmann method on GPU-rich supercomputers

Onodera, Naoyuki   ; Idomura, Yasuhiro   ; Ali, Y.*; Shimokawabe, Takashi*

We have developed a communication reduced multi-time-step (CRMT) algorithm for the Post-K supercomputer, and measured the performance on the GPU-based supercomputers. This algorithm is based on the temporal blocking method, and can improve computational efficiency by replacing a communication bottleneck with additional computation. The proposed method is easily applied to the explicit time integration scheme, and is implemented on an extreme scale airflow simulation code CityLBM. We evaluate the performance of the CRMT algorithm on GPU based supercomputers, TSUBAME and Reedbush. Thanks to the CRMT algorithm, the communication cost is reduced by 64%, and weak and strong scaling are improved up to 200 GPUs. The obtained performance indicates that real time airflow simulations for about 2 km square area with the wind speed of 5m/s is feasible using 1m resolution. We conclude that the CRMT algorithm is indispensable for the AMR-LBM to realize a real time simulation on future exascale systems.



