Enhancing intra-node Multi-GPU stencil calculations on DGX-2 using concurrent-addressing with Unified Memory

Hasegawa, Yuta ; Onodera, Naoyuki ; Idomura, Yasuhiro

In the "CityLBM" project at JAEA, a real-time AMR (adaptive mesh refinement)-based urban wind prediction code was developed. Towards the next generation of CityLBM code, ensemble simulations are needed to improve the reliability of the prediction. For this purpose, the memory usage should be shrunk into a single node or 4-16 GPUs per simulation. To reduce the memory usage and accelerate data communication in the AMR code, we tried an intra-node multi-GPU implementation using Unified Memory in CUDA. This approach enables easy parallel GPU implementation, because the access to Unified Memory is automatically managed via HBM2 (self GPU) or NVLink (neighbor GPU). We implemented multi-GPU calculations for a 3D diffusion equation and a lattice Boltzmann equation on uniform mesh, and tested weak/strong scalability and the performance of NVLink.

Language	:	English
Journal	:
Volume	:
Number	:
Pages	:
Publication Year/Month	:
Meeting title	:	GPU Technology Conference Silicon Valley (GTC 2020)
Held date	:	2020/03
Location (city)	:	San Jose (online)
Location (country)	:	U.S.A.
Patent information	:
PDF	:
Paper URL	:
DOI for research data	:	Link to research data related to this paper/report.
Keywords	:	複数GPU計算; Unified Memory
Research Facility	:	大型計算機・スパコン(東海)
Press Release	:
Article of JAEA R&D Review	:
Cooperating Institute	:

Accesses	:	- Accesses
Web of Science® Times Cited Count	:	Times Cited Count： If you would like to get the latest times cited, please access the Web of Science®. http://www.webofknowledge.com/wos
InCites™	:
Altmetrics	:

Registration No. : BB20191973
JAEA Abstracts No. :
Paper Submission No. :

[CLARIVATE ANALYTICS], [WEB OF SCIENCE], [HIGHLY CITED PAPER & CUP LOGO] and [HOT PAPER & FIRE LOGO] are trademarks of Clarivate Analytics, and/or its affiliated company or companies, and used herein by permission and/or license.