Refine your search:     
Report No.
 - 

Fault-tolerant mechanism of both job execution and file transfer for integrated nuclear energy simulation

Tatekawa, Takayuki; Teshima, Naoya*; Suzuki, Yoshio   ; Takemiya, Hiroshi

By integrating simulation codes which simulate physical process or part of nuclear energy facility, large-scale and detailed simulation can be carried out. Such integrated simulations require several weeks or months of CPU times. Avoiding unscheduled outage of computers or network, we have developed fault-tolerant mechanism for cooperative execution of the codes. The mechanism covers abnormal end of jobs on supercomputers and error of file transfers. When the computer causes unexpected outage, the mechanism tries to submit job of simulation to alternative computer. Furthermore, by comparison the size of the files between before and after transfer, the mechanism detects error of the transfer. In the fault-tolerant mechanism, because the relations between the jobs and the file transfers are connected, we can decide an execution order of the codes by the definition of file flow. Therefore we can operate integrated simulations in which the codes are executed sequentially or concurrently.

Accesses

:

- Accesses

InCites™

:

Altmetrics

:

[CLARIVATE ANALYTICS], [WEB OF SCIENCE], [HIGHLY CITED PAPER & CUP LOGO] and [HOT PAPER & FIRE LOGO] are trademarks of Clarivate Analytics, and/or its affiliated company or companies, and used herein by permission and/or license.