🤖 AI Summary
This work investigates large language models’ (LLMs) capacity to comprehend compiler intermediate representations (IR), specifically evaluating their performance on four core tasks: control-flow graph (CFG) reconstruction, decompilation, code summarization, and execution reasoning. Through systematic multi-model benchmarking (GPT-4, LLaMA 3.1, Gemma 2, etc.), a curated structured IR dataset, task-specific prompt engineering, and fine-grained error attribution, the study provides the first empirical evidence of fundamental limitations in LLMs’ IR understanding—particularly in CFG reconstruction (accuracy <42%) and execution reasoning (error rate 68%). Methodologically, it introduces a dual-path enhancement paradigm: (1) IR-domain fine-tuning and (2) explicit control-flow modeling. Experimental results demonstrate that targeted fine-tuning improves task performance by up to 31.5%, establishing a foundational framework for advancing LLM-based IR analysis.
📝 Abstract
Intermediate Representations (IRs) are essential in compiler design and program analysis, yet their comprehension by Large Language Models (LLMs) remains underexplored. This paper presents a pioneering empirical study to investigate the capabilities of LLMs, including GPT-4, GPT-3, Gemma 2, LLaMA 3.1, and Code Llama, in understanding IRs. We analyze their performance across four tasks: Control Flow Graph (CFG) reconstruction, decompilation, code summarization, and execution reasoning. Our results indicate that while LLMs demonstrate competence in parsing IR syntax and recognizing high-level structures, they struggle with control flow reasoning, execution semantics, and loop handling. Specifically, they often misinterpret branching instructions, omit critical IR operations, and rely on heuristic-based reasoning, leading to errors in CFG reconstruction, IR decompilation, and execution reasoning. The study underscores the necessity for IR-specific enhancements in LLMs, recommending fine-tuning on structured IR datasets and integration of explicit control flow models to augment their comprehension and handling of IR-related tasks.