Enhancing Semantic Understanding in Pointer Analysis using Large Language Models

📅 2025-08-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Pointer analysis often exhibits excessive conservatism for user-defined functions due to insufficient semantic understanding, leading to spurious interprocedural fact propagation. This paper proposes LMPA—the first framework integrating large language models (LLMs) into pointer analysis. LMPA addresses the problem via semantic matching to identify user-defined functions behaviorally analogous to system APIs, models their semantics to infer initial points-to sets, and leverages natural-language generation to produce context-sensitive function summaries that dynamically suppress false-positive propagation. By overcoming the semantic limitations of traditional static analysis while preserving scalability, LMPA significantly improves precision. Experimental evaluation demonstrates that LMPA reduces false positives by 23.6% and achieves an average 18.4% improvement in F1-score on cross-context points-to analysis and summary generation tasks—establishing a novel paradigm for LLM-driven semantic enhancement in program analysis.

Technology Category

Application Category

📝 Abstract
Pointer analysis has been studied for over four decades. However, existing frameworks continue to suffer from the propagation of incorrect facts. A major limitation stems from their insufficient semantic understanding of code, resulting in overly conservative treatment of user-defined functions. Recent advances in large language models (LLMs) present new opportunities to bridge this gap. In this paper, we propose LMPA (LLM-enhanced Pointer Analysis), a vision that integrates LLMs into pointer analysis to enhance both precision and scalability. LMPA identifies user-defined functions that resemble system APIs and models them accordingly, thereby mitigating erroneous cross-calling-context propagation. Furthermore, it enhances summary-based analysis by inferring initial points-to sets and introducing a novel summary strategy augmented with natural language. Finally, we discuss the key challenges involved in realizing this vision.
Problem

Research questions and friction points this paper is trying to address.

Improving pointer analysis precision by integrating large language models
Addressing incorrect fact propagation in existing pointer analysis frameworks
Enhancing semantic understanding of user-defined functions using LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrating LLMs into pointer analysis
Modeling user functions like system APIs
Enhancing summary analysis with natural language
B
Baijun Cheng
Peking University, China
K
Kailong Wang
Huazhong University of Science and Technology, China
L
Ling Shi
Nanyang Technological University, Singapore
H
Haoyu Wang
Huazhong University of Science and Technology, China
Yao Guo
Yao Guo
Beijing Institute of Technology
Nanodevices
D
Ding Li
Peking University, China
X
Xiangqun Chen
Peking University, China