Bottleneck in scientific codes
KAUST applications:
Explicit high-order methods for hyperbolic PDEs (Ketcheson)
Inverse seismic imaging (Schuster)
Navier-Stokes-Korteweg (Calo)
Plasmoid simulations (Samtaney)
Inhibition in the utilization of its SIMD and the dual-issue pipeline
PowerPC 450 requires reordered instructions to complete earlier
The 3-point stencil is a building block to many stencils
Fully utilizes SIMD-like capabilities with no instruction waste
Simplifies coding with a faster development-testing loop
Automates out-of-order scheduling and cycle-acurate performance modeling
Optimization-enabling accurate modeling of the PowerPC 450 pipeline
1.72x speedup over the best published results for large size problems [1]
2.16x speedup over optimized C codes for domain size fitting in L1 cache
1.K. Datta, ``Auto-tuning Stencial Codes for Cache-Based Multicore Platforms", PhD Thesis, EECS Department, University of California, Berkeley, December 2009
2.T. Malas, A. Ahmadia, J. Brown, J. Gunnels, and D. Keyes, ``Optimizing the Performance of Streaming Numerical Kernels on the IBM Blue Gene/P PowerPC 450 Processor", Conditionally accepted, IJHPCA Journal