H. Ibeid, R. Yokota, and D. Keyes
International Journal of High Performance Computing Applications, (2015)
Exascale systems are predicted to have approximately one billion cores,
assuming Gigahertz cores. Limitations on affordable network topologies for
distributed memory systems of such massive scale bring new challenges to the
current parallel programing model. Currently, there are many efforts to
evaluate the hardware and software bottlenecks of exascale designs. There is
therefore an urgent need to model application performance and to understand
what changes need to be made to ensure extrapolated scalability. The fast
multipole method (FMM) was originally developed for accelerating N-body
problems in astrophysics and molecular dynamics, but has recently been extended
to a wider range of problems, including preconditioners for sparse linear
solvers. It's high arithmetic intensity combined with its linear complexity and
asynchronous communication patterns makes it a promising algorithm for exascale
systems. In this paper, we discuss the challenges for FMM on current parallel
computers and future exascale architectures, with a focus on inter-node
communication. We develop a performance model that considers the communication
patterns of the FMM, and observe a good match between our model and the actual
communication time, when latency, bandwidth, network topology, and multi-core
penalties are all taken into account. To our knowledge, this is the first
formal characterization of inter-node communication in FMM, which validates the
model against actual measurements of communication time.