Most compilers convert all iterations of a vectorizable loop into vector operations to decrease processing time. This pa- per proposes Scalar Interpolation, a technique that inserts scalar operations into vectorized loops to increase the utiliza- tion of execution units in processors with distinct pipelines for scalar and vector processing. Scalar interpolation inserts scalar operations for an entire iteration of the sequential loop to avoid data movements between vector and scalar registers. A challenge to introducing scalar interpolation is creating a static cost model to guide the compiler’s decision to interpolate scalar operations in a loop. An alternative to a static cost model is to perform auto-tuning in a loop to dy- namically discover a sweet spot for the scalar interpolation factor. A performance study on an LLVM-based prototype reveals speedups of up to 30% on Intel Xeon (x86) with a static analysis of the cost model, and 43% on Kungpeng-920 (AArch64) with auto-tuning.

Mon 3 Mar

Displayed time zone: Pacific Time (US & Canada) change

11:20 - 12:20
Optimizations & Transformations (1)Main Conference at Casuarina Ballroom (Level 2)
Chair(s): Oleksandr Zinenko n/a
11:20
20m
Talk
SySTeC: A Symmetric Sparse Tensor Compiler
Main Conference
Radha Patel Massachusetts Institute of Technology, Willow Ahrens Massachusetts Institute of Technology, Saman Amarasinghe Massachusetts Institute of Technology
11:40
20m
Talk
Pattern Matching in AI Compilers and its Formalization
Main Conference
Joseph W. Cutler University of Pennsylvania, Alexander Collins NVIDIA, Bin Fan Nvidia, Mahesh Ravishankar , Vinod Grover NVIDIA
12:00
20m
Talk
Scalar Interpolation: A Better Balance Between Vector and Scalar Execution for SuperScalar Architectures
Main Conference
Reza Ghanbari University of Alberta, Henry Kao Huawei Technologies Canada, João P. L. De Carvalho AMD, Ehsan Amiri Huawei Technologies Canada, Jose Nelson Amaral University of Alberta