Batched QR and SVD algorithms on GPUs with applications in hierarchical matrix compression
W.H. Boukaram, G. Turkiyyah, H. Ltaief, D.E. Keyes
Parallel Computing, volume 74, pp. 19-33, (2018)
GPU, QR, SVD, Batched Operations, Hierarchical, Compression
We present high performance implementations of the QR and the singular
value decomposition of a batch of small matrices hosted on the GPU with
applications in the compression of hierarchical matrices. The one-sided
Jacobi algorithm is used for its simplicity and inherent parallelism as a
building block for the SVD of low rank blocks using randomized methods.
We implement multiple kernels based on the level of the GPU memory
hierarchy in which the matrices can reside and show substantial speedups
against streamed cuSOLVER SVDs. The resulting batched routine is a key
component of hierarchical matrix compression, opening up opportunities
to perform H-matrix arithmetic efficiently on GPUs.
See all publications 2018