Finite difference method – Laplacian part 2 (amd-lab-notes)
In this post we introduce two common optimizations that can be applied to the kernel to reduce data movement and bring us closer to the new peak: loop tiling to explicitly reduce memory loads and re-order the memory access pattern to improve caching.