🤖 AI Summary
Pointer analysis for C/C++ programs often suffers from limited precision due to coarse modeling of user-defined allocation functions (AFs). This work proposes AFD, a novel framework that integrates lightweight value-flow analysis with large language model (LLM)-driven reasoning to automatically identify AFs and infer their semantic behaviors—enabling precise heap object aliasing without full context-sensitive analysis. Evaluated on 15 real-world projects, AFD identifies over 600 custom AFs, increases the number of accurately modeled heap objects by 26×, reduces alias set size by 39%, and uncovers 17 previously unknown memory errors. End-to-end analysis overhead increases by only 1.4×. To our knowledge, this is the first pointer analysis approach to incorporate LLM-powered pattern understanding for AF semantics, significantly advancing both heap modeling accuracy and practical deployability.
📝 Abstract
Pointer analysis is foundational for many static analysis tasks, yet its effectiveness is often hindered by imprecise modeling of heap allocations, particularly in C/C++ programs where user-defined allocation functions (AFs) are pervasive. Existing approaches largely overlook these custom allocators, leading to coarse aliasing and reduced analysis precision. In this paper, we present AFD, a novel technique that enhances pointer analysis by automatically identifying and modeling custom allocation functions. AFD employs a hybrid approach: it uses value-flow analysis to detect straightforward wrappers and leverages Large Language Models (LLMs) to reason about more complex allocation patterns with side effects. This targeted enhancement enables precise modeling of heap objects at each call site, achieving context-sensitivity-like benefits without the associated overhead. We evaluate AFD on 15 real-world C projects, identifying over 600 custom AFs. Integrating AFD into a baseline pointer analysis yields a 26x increase in modeled heap objects and a 39% reduction in alias set sizes, with only 1.4x runtime overhead. Furthermore, our enhanced analysis improves indirect call resolution and uncovers 17 previously undetected memory bugs. These results demonstrate that precise modeling of custom allocation functions offers a scalable and practical path to improving pointer analysis in large software systems.