Refine your search:     
Report No.
 - 
Search Results: Records 1-7 displayed on this page of 7
  • 1

Presentation/Publication Type

Initialising ...

Refine

Journal/Book Title

Initialising ...

Meeting title

Initialising ...

First Author

Initialising ...

Keyword

Initialising ...

Language

Initialising ...

Publication Year

Initialising ...

Held year of conference

Initialising ...

Save select records

Journal Articles

Performance portable implementation of a kinetic plasma simulation mini-app with a higher level abstraction and directives

Asahi, Yuichi; Latu, G.*; Bigot, J.*; Grandgirard, V.*

Proceedings of Joint International Conference on Supercomputing in Nuclear Applications + Monte Carlo 2020 (SNA + MC 2020), p.218 - 224, 2020/10

Performance portability is expected to be a critical issue in the upcoming exascale era. We explore a performance portable approach for a fusion plasma turbulence simulation code employing the kinetic model, namely the GYSELA code. For this purpose, we extract the key features of GYSELA such as the high dimensionality (more than 4D) and the semi-Lagrangian scheme, and encapsulate them into a mini-application which solves the similar but a simplified Vlasov-Poisson system as GYSELA. We implement the mini-app with OpenACC, OpenMP4.5 and Kokkos, where we suppress unnecessary duplications of code lines. Based on our experience, we discuss the advantages and disadvantages of OpenACC, OpenMP4.5 and Kokkos, from the view point of performance portability, readability and productivity.

Journal Articles

MPI/OpenMP hybrid parallelization of a Monte Carlo neutron/photon transport code MVP

Nagaya, Yasunobu; Adachi, Masaaki*

Proceedings of International Conference on Mathematics & Computational Methods Applied to Nuclear Science & Engineering (M&C 2017) (USB Flash Drive), 6 Pages, 2017/04

MVP is a general-purpose Monte Carlo code for neutron and photon transport calculations based on the continuous-energy method. To speed up the MVP code, hybrid parallelization is applied with a message passing interface library MPI and a shared-memory multiprocessing library OpenMP. The performance test has been done for an eigenvalue calculation of a fast reactor subassembly, a fixed-source calculation of a neutron/photon coupled problem and a PWR full core model. Comparisons has been made for MPI only with 4 processes and hybrid parallelism with 4 processes $$times$$ 3 threads. As a result, the hybrid parallelism yields the reduction of elapsed time by 16% to 34% and the used memories are almost the same.

Journal Articles

Parallel computing with Particle and Heavy Ion Transport code System (PHITS)

Furuta, Takuya; Sato, Tatsuhiko; Ogawa, Tatsuhiko; Niita, Koji*; Ishikawa, Kenichi*; Noda, Shigeho*; Takagi, Shu*; Maeyama, Takuya*; Fukunishi, Nobuhisa*; Fukasaku, Kazuaki*; et al.

Proceedings of Joint International Conference on Mathematics and Computation, Supercomputing in Nuclear Applications and the Monte Carlo Method (M&C + SNA + MC 2015) (CD-ROM), 9 Pages, 2015/04

In Particle and Heavy Ion Transport code System PHITS, two parallel computing functions are prepared to reduce the computational time. One is the distributed-memory parallelization using message passing interface (MPI) and the other is the shared-memory parallelization using OpenMP directives. Each function has advantages and disadvantages, and thus, by adopting both functions in PHITS, it is possible to conduct parallel computing suited for needs of users. It is also possible to conduct the hybrid parallelization by the intra-node OpenMP parallelization and the inter-node MPI parallelization in supercomputer systems. Each parallelization functions were explained together with some application results obtained using a workstation and a supercomputer system, K computer at RIKEN.

Journal Articles

Performance evaluation of HP AlphaServer SC system

Horikoshi, Masashi*; Ueshima, Yutaka; Kubo, Kenji*; Wakabayashi, Daisuke*; Nishihara, Katsunobu*

Hai Pafomansu Komputingu To Keisan Kagaku Shimpojium (HPCS 2005) Rombunshu, p.65 - 72, 2005/01

no abstracts in English

Oral presentation

Accumulating knowledge for a performance portable kinetic plasma simulation code with Kokkos and directives

Asahi, Yuichi; Latu, G.*; Bigot, J.*; Grandgirard, V.*

no journal, , 

To prepare the performance portable version of the kinetic plasma simulation code, we develop a simplified but self-contained semi-Lagrangian mini-app with Kokkos performance portable framework and OpenMP/OpenACC which works on both CPUs and GPUs. We investigate the performance of the mini-app over the novel arm-based processor Fujitsu A64FX, Nvidia Tesla GPU, and Intel Skylake, where the arm-based architectures and GPUs are supposed to be major architectures in the exa-scale supercomputing era. The porting cost is highly suppressed with both Kokkos and directive implementations, where the code duplication is avoided. The higher performance portability is achieved with OpenMP/OpenACC, particularly for the compute intense kernels among the hotspots. Unfortunately, a relatively low performance is obtained on A64FX for kernels with indirect memory accesses. We also discuss what kind of Kokkos/OpenMP/OpenACC features are useful to improve the readability and productivity.

Oral presentation

Optimization strategy for a performance portable kinetic plasma simulation code

Asahi, Yuichi

no journal, , 

We present optimization strategies dedicated to a kinetic plasma simulation code that makes use of OpenACC/OpenMP4.5/OpenMP directives and Kokkos performance portable framework to run across multiple CPUs and GPUs. We evaluate the impacts of optimizations on multiple hardware platforms: Intel Xeon Skylake, and Nvidia Tesla P100 and V100. After the optimizations, the OpenACC/OpenMP version achieved the acceleration of 1.07 to 1.39. The Kokkos version in turn achieved the acceleration of 1.00 to 1.33.Since the impact of optimizations under multiple combinations of kernels, devices and parallel implementations is demonstrated, this paper provides a widely available approach to accelerate a code keeping the performance portability. To achieve an excellent performance on both CPUs and GPUs, Kokkos could be a reasonable choice which offers more flexibility to manage multiple data and loop structures with a single codebase.

Oral presentation

How to prepare the GYSELA-X code to future exascale edge-core simulations

Grandgirard, V.*; Asahi, Yuichi; Bigot, J.*; Bourne, E.*; Dif-Pradalier, G.*; Donnel, P.*; Garbet, X.*; Ghendrih, P.*

no journal, , 

Core transport modelling in tokamak plasmas has now reached maturity with non-linear 5D gyrokinetic codes in the world available to address this issue. However, despite numerous successes, their predictive capabilities are still challenged, especially for optimized discharges. Bridging this gap requires extending gyrokinetic modelling in the edge and close to the material boundaries, preferably addressing edge and core transport on an equal footing. This is one of the long term challenges for the petascale code GYSELA [V. Grandgirard et al., CPC 2017 (35)]. Edge-core turbulent plasma simulations with kinetic electrons will require exascale HPC capabilities. We present here the different strategies that we are currently exploring to target the disruptive use of billions of computing cores expected in exascale-class supercomputer as OpenMP4.5 tasks for overlapping computations and MPI communications, KOKKOS for performant portability programming and code refactoring.

7 (Records 1-7 displayed on this page)
  • 1