Tensor compilers play a critical role in optimizing deep neural networks (DNNs), with memory performance emerging as a key bottleneck in code generation for DNN models. Existing tensor compilers are constrained by inefficient auto-tuning algorithms. They either must deploy coarse-grained descriptions, thus miss potential optimization, or struggle with vast search spaces, rendering auto-tuning inapplicable. Tensor compilers require a more holistic optimization of memory performance to overcome these constraints.

To address this issue, we focus our optimization objective on memory performance, which allows us to design monotonic optimization methods, significantly enhancing the efficiency of auto-tuning and thus enabling auto-tuning on a fine-granularity description. Based on these observations, we propose IntelliGen, a tensor compiler with instruction-level auto-tuning and monotonic memory optimization. We design an instruction-level graph description, iGraph, and a monotonic optimization method for optimization on iGraph. Benefiting from auto-tuning techniques with fine-grained description, IntelliGen demonstrates significant speedup of up to $3.13\times$, $3.56\times$, and $16.91\times$ (averaging $1.46\times$, $1.85\times$, and $2.30\times$, respectively) on NVIDIA GPUs, AMD GPUs, and Cambricon MLUs over the most efficient existing frameworks.

Mon 3 Mar

Displayed time zone: Pacific Time (US & Canada) change

14:00 - 15:20
ML Tools & OptimizationMain Conference at Casuarina Ballroom (Level 2)
Chair(s): Jeronimo Castrillon TU Dresden, Germany
14:00
20m
Talk
VEGA: Automatically Generating Compiler Backends Using a Pre-Trained Transformer Model
Main Conference
Ming Zhong SKLP, Institute of Computing Technology, CAS, Fang Lv Institute of Computing Technology, Chinese Academy of Sciences, Lulin Wang SKLP, ICT, CAS Beijing, China, Lei Qiu SKLP, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences, Yingying Wang SKLP, ICT, CAS Beijing, China, Ying Liu Institute of Computing Technology, Chinese Academy of Sciences, Huimin Cui Institute of Computing Technology, Chinese Academy of Sciences, Xiaobing Feng ICT CAS, Jingling Xue UNSW Sydney
14:20
20m
Talk
IntelliGen: Instruction-Level Auto-Tuning for Tensor Program with Monotonic Memory Optimization
Main Conference
Zixuan Ma Tsinghua University, Haojie Wang Tsinghua University, Jingze Xing Tsinghua University, Shuhong Huang Tsinghua University, Liyan Zheng Tsinghua University, Chen Zhang Tsinghua University, Huanqi Cao Tsinghua University, Kezhao Huang Tsinghua University, Mingshu Zhai Tsinghua University, Shizhi Tang Tsinghua University, Penghan Wang Tsinghua University, Jidong Zhai Tsinghua University
14:40
20m
Talk
GraalNN: Context-Sensitive Static Profiling with Graph Neural Networks
Main Conference
Lazar Milikic Oracle Labs, Milan Cugurovic Oracle Labs, Vojin Jovanovic Oracle Labs
15:00
20m
Talk
LLM-Vectorizer: LLM-based Verified Loop Vectorizer
Main Conference
Jubi Taneja Microsoft Research, Avery Laird University of Toronto, Cong Yan Microsoft Research, Madan Musuvathi Microsoft Research, Shuvendu K. Lahiri Microsoft Research