A Priori Loop Nest Normalization: Automatic Loop Scheduling in Complex Applications
The same computations are often expressed differently across software projects and programming languages. In particular, how computations involving loops are expressed varies due to the many possibilities to permute and compose loops. Since each variant may have unique performance properties, automatic approaches to loop scheduling must support many different optimization recipes. In this paper, we propose a priori loop nest normalization to align loop nests and reduce the variation before the optimization. Specifically, we define and apply normalization criteria, mapping loop nests with different memory access patterns to the same canonical form. Since the memory access pattern is susceptible to loop variations and critical for performance, this normalization allows many loop nests to be optimized by the same optimization recipe. To evaluate our approach, we apply the normalization with optimizations designed for only the canonical form, improving the performance of many different loop nest variants. Across multiple implementations of 15 benchmarks using different languages, we outperform a baseline compiler in C on \textit{average} by a factor of $21.13$, state-of-the-art auto-schedulers such as \textit{Polly} and the \textit{Tiramisu auto-scheduler} by $2.31$ and $2.89$, as well as performance-oriented Python-based frameworks such as \textit{NumPy}, \textit{Numba}, and \textit{DaCe} by $9.04$, $3.92$, and $1.47$. Furthermore, we apply the concept to the \textit{CLOUDSC} cloud microphysics scheme, an actively used component of the Integrated Forecasting System, achieving a 10% speedup over the highly-tuned Fortran code.
Tue 4 MarDisplayed time zone: Pacific Time (US & Canada) change
11:20 - 12:20 | Optimizations & Transformations (2)Main Conference at Casuarina Ballroom (Level 2) Chair(s): Sebastian Hack Saarland University, Saarland Informatics Campus | ||
11:20 20mTalk | PreFix: Optimizing the Performance of Heap-Intensive Applications Main Conference Chaitanya Mamatha Ananda University of California Riverside, Rajiv Gupta University of California at Riverside (UCR), Sriraman Tallam Google Inc., Han Shen Google Inc, David Li Google | ||
11:40 20mTalk | A Priori Loop Nest Normalization: Automatic Loop Scheduling in Complex Applications Main Conference Lukas Trümper Daisytuner, Philipp Schaad ETH Zurich, Berke Ates ETH Zurich, Alexandru Calotoiu ETH Zurich, Marcin Copik ETH Zurich, Torsten Hoefler ETH Zurich | ||
12:00 20mTalk | An Efficient Polynomial Multiplication Derived Implementation Of Convolution in Neural Networks Main Conference Haoke Xu University of Delaware, Yulin Zhang Minzu University of China, Zitong Cheng University of Delaware, Xiaoming Li University of Delaware |