Boosting Pointer Analysis With Large Language Model-Enhanced Allocation Function Detection

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

Pointer analysis for C/C++ programs often suffers from limited precision due to coarse modeling of user-defined allocation functions (AFs). This work proposes AFD, a novel framework that integrates lightweight value-flow analysis with large language model (LLM)-driven reasoning to automatically identify AFs and infer their semantic behaviors—enabling precise heap object aliasing without full context-sensitive analysis. Evaluated on 15 real-world projects, AFD identifies over 600 custom AFs, increases the number of accurately modeled heap objects by 26×, reduces alias set size by 39%, and uncovers 17 previously unknown memory errors. End-to-end analysis overhead increases by only 1.4×. To our knowledge, this is the first pointer analysis approach to incorporate LLM-powered pattern understanding for AF semantics, significantly advancing both heap modeling accuracy and practical deployability.

Technology Category

Application Category

📝 Abstract

Pointer analysis is foundational for many static analysis tasks, yet its effectiveness is often hindered by imprecise modeling of heap allocations, particularly in C/C++ programs where user-defined allocation functions (AFs) are pervasive. Existing approaches largely overlook these custom allocators, leading to coarse aliasing and reduced analysis precision. In this paper, we present AFD, a novel technique that enhances pointer analysis by automatically identifying and modeling custom allocation functions. AFD employs a hybrid approach: it uses value-flow analysis to detect straightforward wrappers and leverages Large Language Models (LLMs) to reason about more complex allocation patterns with side effects. This targeted enhancement enables precise modeling of heap objects at each call site, achieving context-sensitivity-like benefits without the associated overhead. We evaluate AFD on 15 real-world C projects, identifying over 600 custom AFs. Integrating AFD into a baseline pointer analysis yields a 26x increase in modeled heap objects and a 39% reduction in alias set sizes, with only 1.4x runtime overhead. Furthermore, our enhanced analysis improves indirect call resolution and uncovers 17 previously undetected memory bugs. These results demonstrate that precise modeling of custom allocation functions offers a scalable and practical path to improving pointer analysis in large software systems.

Problem

Research questions and friction points this paper is trying to address.

Enhancing pointer analysis precision by detecting custom allocation functions

Modeling complex heap allocations using LLM-enhanced reasoning techniques

Improving bug detection through precise heap object identification

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-enhanced allocation function detection

Hybrid value-flow and LLM reasoning approach

Context-sensitivity-like precision without overhead

🔎 Similar Papers

No similar papers found.