Vectorization is a powerful optimization technique that significantly boosts the performance of high performance computing applications operating on large data arrays. Despite decades of research on auto-vectorization, compilers frequently miss opportunities to vectorize code. On the other hand, writing vectorized code manually using compiler intrinsics is still a complex, error-prone task that demands deep knowledge of specific architecture and compilers.

In this paper, we evaluate the potential of large-language models (LLMs) to generate vectorized (Single Instruction Multiple Data) code from scalar programs that process individual array elements. We propose a novel finite-state machine multi-agents based approach that harnesses LLMs and test-based feedback to generate vectorized code. Our findings indicate that LLMs are capable of producing high performance vectorized code with run-time speedup ranging from 1.1𝑥 to 9.4𝑥 as compared to the state-of-the-art compilers such as Intel Compiler, GCC, and Clang.

To verify the correctness of vectorized code, we use Alive2, a leading bounded translation validation tool for LLVM IR. We describe a few domain-specific techniques to improve the scalability of Alive2 on our benchmark dataset. Overall, our approach is able to verify 38.2% of vectorizations as correct on the TSVC benchmark dataset.

Mon 3 Mar

Displayed time zone: Pacific Time (US & Canada) change

14:00 - 15:20
ML Tools & OptimizationMain Conference at Casuarina Ballroom (Level 2)
Chair(s): Jeronimo Castrillon TU Dresden, Germany
14:00
20m
Talk
VEGA: Automatically Generating Compiler Backends Using a Pre-Trained Transformer Model
Main Conference
Ming Zhong SKLP, Institute of Computing Technology, CAS, Fang Lv Institute of Computing Technology, Chinese Academy of Sciences, Lulin Wang SKLP, ICT, CAS Beijing, China, Lei Qiu SKLP, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences, Yingying Wang SKLP, ICT, CAS Beijing, China, Ying Liu Institute of Computing Technology, Chinese Academy of Sciences, Huimin Cui Institute of Computing Technology, Chinese Academy of Sciences, Xiaobing Feng ICT CAS, Jingling Xue UNSW Sydney
14:20
20m
Talk
IntelliGen: Instruction-Level Auto-Tuning for Tensor Program with Monotonic Memory Optimization
Main Conference
Zixuan Ma Tsinghua University, Haojie Wang Tsinghua University, Jingze Xing Tsinghua University, Shuhong Huang Tsinghua University, Liyan Zheng Tsinghua University, Chen Zhang Tsinghua University, Huanqi Cao Tsinghua University, Kezhao Huang Tsinghua University, Mingshu Zhai Tsinghua University, Shizhi Tang Tsinghua University, Penghan Wang Tsinghua University, Jidong Zhai Tsinghua University
14:40
20m
Talk
GraalNN: Context-Sensitive Static Profiling with Graph Neural Networks
Main Conference
Lazar Milikic Oracle Labs, Milan Cugurovic Oracle Labs, Vojin Jovanovic Oracle Labs
15:00
20m
Talk
LLM-Vectorizer: LLM-based Verified Loop Vectorizer
Main Conference
Jubi Taneja Microsoft Research, Avery Laird University of Toronto, Cong Yan Microsoft Research, Madan Musuvathi Microsoft Research, Shuvendu K. Lahiri Microsoft Research