Power profiling of Cholesky and QR factorizations on distributed memory systems

G. Bosilca, H. Ltaief, and J. Dongarra
Computer Science - Research and Development, 29, pp. 139-147, (2014)

Power profiling of Cholesky and QR factorizations on distributed memory systems

Keywords

Power profile analysis, Dense linear algebra, Distributed memory system, Dynamic scheduler

Abstract

​This paper presents the power profile of two high performance dense linear algebra libraries on distributed memory systems, ScaLAPACK and DPLASMA. From the algorithmic perspective, their methodologies are opposite. The former is based on block algorithms and relies on multithreaded BLAS and a two-dimensional block cyclic data distribution to achieve high parallel performance. The latter is based on tile algorithms running on top of a tile data layout and uses fine-grained task parallelism combined with a dynamic distributed scheduler (DAGuE) to leverage distributed memory systems. We present performance results (Gflop/s) as well as the power profile (Watts) of two common dense factorizations needed to solve linear systems of equations, namely Cholesky and QR. The reported numbers show that DPLASMA surpasses ScaLAPACK not only in terms of performance (up to 2X speedup) but also in terms of energy efficiency (up to 62 %).

Code

DOI: 10.1007/s00450-012-0224-2

Sources

Website PDF

See all publications 2014