A New data conversion method for mixed precision Krylov solvers with FP16/BF16 Jacobi preconditioners

FP16/BF16ヤコビ前処理による混合精度クリロフソルバのための新しいデータ変換法

Ina, Takuya; Idomura, Yasuhiro; Imamura, Toshiyuki*; Onodera, Naoyuki

ヤコビ前処理による混合精度クリロフソルバは、ヤコビ前処理をFP16やBF16のような低精度で計算した場合しばしば著しい収束性の悪化を示すことがある。この収束性の悪化はデータ変換時の丸め誤差により対角優位性が失われることに起因することがわかった。この問題を解決するために、元の行列データの対角優位性を保つように設計された新しいデータ変換方法を提案する。NVIDIA V100 GPU上でポアソン方程式を共役勾配法、双共役勾配安定化法、一般化最小残差法にFP16/BF16ヤコビ前処理を組み合わせた混合精度クリロフソルバによって計算することによって提案手法を検証する。データ変換はCUDAの組み込み関数を利用して最近接丸め、正の無限大丸め、負の無限大丸め、ゼロ方向丸めを切り替えて実装し、これが主反復の前に一度だけ呼び出される。したがって、提案するデータ変換にかかるコストは無視できる程度に小さい。連立一次方程式をスケーリングして行列の係数を連続的に変化させた場合に、最近接丸めによる従来のデータ変換では、対角係数と非対角係数の丸め誤差に依存して収束性が周期的に変化する。ここで、収束性悪化の周期と大きさは仮数部のビット長に依存する。一方、提案するデータ変換方式では収束性悪化を完全に回避できることが示され、ヤコビ前処理において余分なコストを伴わないロバストな混合精度計算が可能となった。

Mixed precision Krylov solvers with the Jacobi preconditioner often show significant convergence degradation when the Jacobi preconditioner is computed in low precision such as FP16 and BF16. It is found that this convergence degradation is attributed to loss of diagonal dominance due to roundoff errors in data conversion. To resolve this issue, we propose a new data conversion method, which is designed to keep diagonal dominance of the original matrix data. The proposed method is tested by computing the Poisson equation using the conjugate gradient method, the general minimum residual method, and the biconjugate gradient stabilized method with the FP16/BF16 Jacobi preconditioner on NVIDIA V100 GPUs. Here, the new data conversion is implemented by switching the round-nearest, round-up, round-down, and round-towards-zero intrinsics in CUDA, and is called once before the main iteration. Therefore, the cost of the new data conversion is negligible. When the coefficients of matrix is continuously changed by scaling the linear system, the conventional data conversion based on the round-nearest intrinsic shows periodic changes of the convergence property depending on the difference of the roundoff errors between diagonal and off-diagonal coefficients. Here, the period and magnitude of the convergence degradation depend on the bit length of significand. On the other hand, the proposed data conversion method is shown to fully avoid the convergence degradation, and robust mixed precision computing is enabled for the Jacobi preconditioner without extra overheads.

発表言語	:	English
掲載資料名	:	Proceedings of International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia 2023) (Internet)
巻	:
号	:
ページ数	:	p.29 - 34
発行年月	:	2023/02
発表会議名	:	Supercomputing Asia 2023 & HPC Asia 2023
開催年月	:	2023/02
開催都市	:	Singapore
開催国	:	Singapore
キーワード	:	クリロフ部分空間法; 低精度演算

特許データ	:
PDF	:

論文URL	:	https://doi.org/10.1145/3578178.3578222
研究データの公開先DOI	:	本成果にかかわる研究データのリンクです。
使用施設	:	大型計算機・スパコン(東海)
広報プレスリリース	:
論文解説記事 (成果普及情報誌)	:	原子力シミュレーションを加速するデータ変換手法の開発; 16bit演算を用いた行列解法で64bit演算と同等の収束特性を達成[/]
受委託・共同研究相手機関	:

Access	:	- Accesses
Web of Science® Times Cited Count	:	被引用回数：評価・統計等のため最新の被引用回数を確認したい場合は、直接Web of Science®をご確認ください。 http://www.webofknowledge.com/wos
InCites™	:
Altmetrics	:

登録番号 : BB20221269
抄録集掲載番号 : 51000519
論文投稿番号 : 26451

[CLARIVATE ANALYTICS], [WEB OF SCIENCE], [HIGHLY CITED PAPER & CUP LOGO] and [HOT PAPER & FIRE LOGO] are trademarks of Clarivate Analytics, and/or its affiliated company or companies, and used herein by permission and/or license.