Dense matrix computations on NUMA architectures with distance-aware work stealing
R. AlOmairy, G. Miranda, H. Ltaief, R. Badia, X. Martorell, J. Labarta, and D. Keyes
Supercomputing Frontiers and Innovations, 2, pp. 49-72, (2015)
Keywords
Dense Matrix Computations, Dynamic Runtime Systems, Software Productivity, Non-Uniform Memory Access, Data Locality, Work Stealing, High Performance Computing
Abstract
Page Content
We employ the dynamic runtime system OmpSs to decrease the overhead of
data motion in the now ubiquitous non-uniform memory access (NUMA) high
concurrency environment of multicore processors. The dense numerical
linear algebra algorithms of Cholesky factorization and symmetric matrix
inversion are employed as representative benchmarks. Work stealing
occurs within an innovative NUMA-aware scheduling policy to reduce data
movement between NUMA nodes. The overall approach achieves separation of
concerns by abstracting the complexity of the hardware from the end
users so that high productivity can be achieved. Performance results on a
large NUMA system outperform the state-of-the-art existing
implementations up to a two fold speedup for the Cholesky factorization,
as well as the symmetric matrix inversion, while the OmpSs-enabled code
maintains strong similarity to its original sequential version.
Code
DOI: 10.14529/jsfi150103
See all publications 2015
No