🤖 AI Summary
Pointer analysis often exhibits excessive conservatism for user-defined functions due to insufficient semantic understanding, leading to spurious interprocedural fact propagation. This paper proposes LMPA—the first framework integrating large language models (LLMs) into pointer analysis. LMPA addresses the problem via semantic matching to identify user-defined functions behaviorally analogous to system APIs, models their semantics to infer initial points-to sets, and leverages natural-language generation to produce context-sensitive function summaries that dynamically suppress false-positive propagation. By overcoming the semantic limitations of traditional static analysis while preserving scalability, LMPA significantly improves precision. Experimental evaluation demonstrates that LMPA reduces false positives by 23.6% and achieves an average 18.4% improvement in F1-score on cross-context points-to analysis and summary generation tasks—establishing a novel paradigm for LLM-driven semantic enhancement in program analysis.
📝 Abstract
Pointer analysis has been studied for over four decades. However, existing frameworks continue to suffer from the propagation of incorrect facts. A major limitation stems from their insufficient semantic understanding of code, resulting in overly conservative treatment of user-defined functions. Recent advances in large language models (LLMs) present new opportunities to bridge this gap. In this paper, we propose LMPA (LLM-enhanced Pointer Analysis), a vision that integrates LLMs into pointer analysis to enhance both precision and scalability. LMPA identifies user-defined functions that resemble system APIs and models them accordingly, thereby mitigating erroneous cross-calling-context propagation. Furthermore, it enhances summary-based analysis by inferring initial points-to sets and introducing a novel summary strategy augmented with natural language. Finally, we discuss the key challenges involved in realizing this vision.