T. Malas, G. Hager, H. Ltaief, and D. Keyes
Workshop on High Performance Computing for Upstream, (2015)
Today’s high-end multicore systems are characterized by a deep memory
hierarchy, i.e., several levels of local and shared caches, with limited
size and bandwidth per core. The ever-increasing gap between the
processor and memory speed will further exacerbate the problem and has
lead the scientific community to revisit numerical software
implementations to better suit the underlying memory subsystem for
performance (data reuse) as well as energy efficiency (data locality).
The authors propose a novel multi-threaded wavefront diamond blocking
(MWD) implementation in the context of stencil computations, which
represents the core operation for seismic imaging in oil industry. The
stencil diamond formulation introduces temporal blocking for high data
reuse in the upper cache levels. The wavefront optimization technique
ensures data locality by allowing multiple threads to share common
adjacent point stencil. Therefore, MWD is able to take up the
aforementioned challenges by alleviating the cache size limitation and
releasing pressure from the memory bandwidth. Performance comparisons
are shown against the optimized 25-point stencil standard seismic
imaging scheme using spatial and temporal blocking and demonstrate the
effectiveness of MWD.