Vector performance prediction of kernel loops on Earth Simulator

Yokokawa, Mitsuo; Saito, Minoru*; Hagiwara, Takashi*; Isobe, Yoko*; Jinguji, Satoshi*

Earth simulator is a distributed memory parallel system which consists of 640 processor nodes connected by a full crossbar network. Each processor node is a shared memory system which is composed of eight vector processors. The total peak performance and main memory capacity are 40Tflops and 10TB, respectively. A performance prediction system GS$$^3$$ for the Earth Simulator has been developed to estimate sustained performance of programs. To validate accuracy of vector performance prediction by the GS$$^3$$, the processing times for three groups of kernel loops estimated by the GS$$^3$$ are compared with the ones measured on SX-4. It is found that the absolute relative errors of the processing time are 0.89%,1.42% and 6.81% in average for three groups. The sustained performance of three groups on a processor of the Earth Simulator have been estimated by the GS$$^3$$ and those performance are 5.94Gflops,3.76Gflops and 2.17Gflops in average.



