Refine your search:     
Report No.

Optimization of fusion kernels on accelerators with indirect or strided memory access patterns

Asahi, Yuichi*; Latu, G.*; Ina, Takuya; Idomura, Yasuhiro  ; Grandgirard, V.*; Garbet, X.*

High-dimensional stencil computation from fusion plasma turbulence codes involving complex memory access patterns, the indirect memory access in a Semi-Lagrangian scheme and the strided memory access in a Finite-Difference scheme, are optimized on accelerators such as GPGPUs and Xeon Phi coprocessors. On both devices, the Array of Structure of Array (AoSoA) data layout is preferable for contiguous memory accesses. It is shown that the effective local cache usage by improving spatial and temporal data locality is critical on Xeon Phi. On GPGPU, the texture memory usage improves the performance of the indirect memory accesses in the Semi-Lagrangian scheme. Thanks to these optimizations, the fusion kernels on accelerators become 1.4x - 8.1x faster than those on Sandy Bridge (CPU).



- Accesses




Category:Computer Science, Theory & Methods



[CLARIVATE ANALYTICS], [WEB OF SCIENCE], [HIGHLY CITED PAPER & CUP LOGO] and [HOT PAPER & FIRE LOGO] are trademarks of Clarivate Analytics, and/or its affiliated company or companies, and used herein by permission and/or license.