π€ AI Summary
This work investigates the capability of large language models (LLMs) to perform symbolic constraint analysis for worst-case program execution, aiming to bridge neural program modeling and formal symbolic reasoning. We formally define this novel task and propose a satisfiability modulo theories (SMT) solver-aligned reinforcement learning fine-tuning paradigm, integrating symbolic reasoning guidance with a custom-constructed constraint dataset to enable efficient fine-tuning of small-scale models (3B). The resulting WARP-1.0-3B model significantly outperforms both same-sized and larger baseline models across multiple symbolic constraint analysis benchmarks. Our results demonstrate that LLMs can not only participate in but also actively drive formal program analysis, establishing a new neuro-symbolic paradigm for program understanding grounded in rigorous constraint reasoning.
π Abstract
Large language models (LLMs) have been successfully applied to a variety of coding tasks, including code generation, completion, and repair. However, more complex symbolic reasoning tasks remain largely unexplored by LLMs. This paper investigates the capacity of LLMs to reason about worst-case executions in programs through symbolic constraints analysis, aiming to connect LLMs and symbolic reasoning approaches. Specifically, we define and address the problem of worst-case symbolic constraints analysis as a measure to assess the comprehension of LLMs. We evaluate the performance of existing LLMs on this novel task and further improve their capabilities through symbolic reasoning-guided fine-tuning, grounded in SMT (Satisfiability Modulo Theories) constraint solving and supported by a specially designed dataset of symbolic constraints. Experimental results show that our solver-aligned model, WARP-1.0-3B, consistently surpasses size-matched and even much larger baselines, demonstrating that a 3B LLM can recover the very constraints that pin down an algorithm's worst-case behaviour through reinforcement learning methods. These findings suggest that LLMs are capable of engaging in deeper symbolic reasoning, supporting a closer integration between neural network-based learning and formal methods for rigorous program analysis.