Acceleration of stencil-based fusion kernels

Asahi, Yuichi* ; Latu, G.*; Ina, Takuya; Idomura, Yasuhiro ; Grandgirard, V.*; Garbet, X.*

Computation kernels of fusion plasma turbulence codes based on the Semi-Lagrangian scheme and the Finite-Difference scheme are optimized on latest many core processors such as GPGPU, XeonPhi, and FX100, and 1.4x-8.1x speedup is achieved. Affinity between different memory access patterns in each numerical scheme and difference memory-cache architectures on each hardware is studied, and different optimization techniques are developed for each architecture. On Xeon Phi, thread load balance is improved, and an optimization technique for effective local cache usage is developed. On GPGPU, an optimization technique using a texture memory and an implementation to reuse registers are developed. On the other hand, on FX100, it is found that the conventional optimization techniques for CPU work.

Language	:	English
Journal	:
Volume	:
Number	:
Pages	:
Publication Year/Month	:
Meeting title	:	3rd Accelerated Computing For Fusion Workshop
Held date	:	2016/11
Location (city)	:	Saclay
Location (country)	:	France
Patent information	:
PDF	:
Paper URL	:
DOI for research data	:
Keywords	:	Gyrokinetic; Stencil computation; Accelerator
Research Facility	:
Press Release	:
Article of JAEA R&D Review	:
Cooperating Institute	:	CEA

Accesses	:	- Accesses
Web of Science® Times Cited Count	:	Times Cited Count： If you would like to get the latest times cited, please access the Web of Science®. http://www.webofknowledge.com/wos
InCites™	:
Altmetrics	:

Registration No. : BB20161473
JAEA Abstracts No. :
Paper Submission No. :

[CLARIVATE ANALYTICS], [WEB OF SCIENCE], [HIGHLY CITED PAPER & CUP LOGO] and [HOT PAPER & FIRE LOGO] are trademarks of Clarivate Analytics, and/or its affiliated company or companies, and used herein by permission and/or license.