KBLAS: An Optimized Library for Dense Matrix-Vector and Matrix-Matrix Operations on GPU Accelerators

​KBLAS (KAUST-BLAS) is a small open-source library that optimizes critical numerical kernels on CUDA-enabled GPUs. KBLAS provides a subset of standard BLAS functions. It also proposes some function with BLAS-like interface that target both single and multi- GPU systems.

The ultimate goal for KBLAS is performance. KBLAS has a set of tuning parameters that affect its performance according to the GPU architecture, and the CUDA runtime version. While we cannot guarantee optimal performance with the default tuning parameters, the user can easily edit such parameters on his local system. KBLAS might be shipped with autotuners in the future. The user can refer to the tuning chapter in this document.

KBLAS is tested only on Linux. It supports compute capabilities 2.0 (Fermi) or higher.

KBLAS can be downloaded using the following link(s): 

Version 1.0

Version 1.1

Version 1.2

Version 1.3-beta (support for new recursive xTRMM kernel)


Related Publications