Vectorization is a powerful optimization technique that significantly boosts the performance of high performance computing applications operating on large data arrays. Despite decades of research on auto-vectorization, compilers frequently miss opportunities to vectorize code. On the other hand, writing vectorized code manually using compiler intrinsics is still a complex, error-prone task that demands deep knowledge of specific architecture and compilers.
In this paper, we evaluate the potential of large-language models (LLMs) to generate vectorized (Single Instruction Multiple Data) code from scalar programs that process individual array elements. We propose a novel finite-state machine multi-agents based approach that harnesses LLMs and test-based feedback to generate vectorized code. Our findings indicate that LLMs are capable of producing high performance vectorized code with run-time speedup ranging from 1.1𝑥 to 9.4𝑥 as compared to the state-of-the-art compilers such as Intel Compiler, GCC, and Clang.
To verify the correctness of vectorized code, we use Alive2, a leading bounded translation validation tool for LLVM IR. We describe a few domain-specific techniques to improve the scalability of Alive2 on our benchmark dataset. Overall, our approach is able to verify 38.2% of vectorizations as correct on the TSVC benchmark dataset.
Mon 3 MarDisplayed time zone: Pacific Time (US & Canada) change
14:00 - 15:20 | ML Tools & OptimizationMain Conference at Casuarina Ballroom (Level 2) Chair(s): Jeronimo Castrillon TU Dresden, Germany | ||
14:00 20mTalk | VEGA: Automatically Generating Compiler Backends Using a Pre-Trained Transformer Model Main Conference Ming Zhong SKLP, Institute of Computing Technology, CAS, Fang Lv Institute of Computing Technology, Chinese Academy of Sciences, Lulin Wang SKLP, ICT, CAS Beijing, China, Lei Qiu SKLP, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences, Yingying Wang SKLP, ICT, CAS Beijing, China, Ying Liu Institute of Computing Technology, Chinese Academy of Sciences, Huimin Cui Institute of Computing Technology, Chinese Academy of Sciences, Xiaobing Feng ICT CAS, Jingling Xue UNSW Sydney | ||
14:20 20mTalk | IntelliGen: Instruction-Level Auto-Tuning for Tensor Program with Monotonic Memory Optimization Main Conference Zixuan Ma Tsinghua University, Haojie Wang Tsinghua University, Jingze Xing Tsinghua University, Shuhong Huang Tsinghua University, Liyan Zheng Tsinghua University, Chen Zhang Tsinghua University, Huanqi Cao Tsinghua University, Kezhao Huang Tsinghua University, Mingshu Zhai Tsinghua University, Shizhi Tang Tsinghua University, Penghan Wang Tsinghua University, Jidong Zhai Tsinghua University | ||
14:40 20mTalk | GraalNN: Context-Sensitive Static Profiling with Graph Neural Networks Main Conference | ||
15:00 20mTalk | LLM-Vectorizer: LLM-based Verified Loop Vectorizer Main Conference Jubi Taneja Microsoft Research, Avery Laird University of Toronto, Cong Yan Microsoft Research, Madan Musuvathi Microsoft Research, Shuvendu K. Lahiri Microsoft Research |