M. Abduljabbar, G.S. Markomanolis, H. Ibeid, R. Yokota, D.E. Keyes
32nd International Conference, ISC High Performance, (2017)
Reduction of communication and efficient partitioning arekey issues for achieving scalability in hierarchicalN-Body algorithms likeFMM. In the present work, we propose four independent strategies to im-prove partitioning and reduce communication. First of all, we show thatthe conventional wisdom of using space-filling curve partitioning may notwork well for boundary integral problems, which constitute about 50%ofFMM’s application user base. We propose an alternative method whichmodifies orthogonal recursive bisection to solve the cell-partition mis-alignment that has kept it from scaling previously. Secondly, we optimizethe granularity of communication to find the optimal balance between abulk-synchronous collective communication of the local essential tree andan RDMA per task per cell. Finally, we take the dynamic sparse dataexchange proposed by Hoefler et al.  and extend it to ahierarchicalsparse data exchange, which is demonstrated at scale to be faster thanthe MPI library’sMPIAlltoallvthat is commonly used.