🤖 AI Summary
The lack of formal specifications for opaque Unix commands hinders the practicality of program analysis systems. Method: This paper introduces the first automated specification mining approach that synergistically integrates large language models (LLMs) with system-level instrumentation: LLMs parse unstructured command documentation to generate structured syntactic and semantic constraints; concurrent system-call and filesystem instrumentation dynamically observes command behavior across diverse environments, extracting key properties—including input domains, side effects, and error modes. Contribution/Results: Our method achieves the first end-to-end, LLM-driven specification synthesis across commands and heterogeneous documentation formats, supports multiple standard specification outputs (e.g., SMT-LIB, JSON Schema), and integrates seamlessly with static analyzers. Evaluated on 60 commands—including GNU Coreutils, POSIX utilities, and third-party tools—it attains a 98.3% specification correctness rate, fully eliminating manual specification authoring.
📝 Abstract
A wealth of state-of-the-art systems demonstrate impressive improvements in performance, security, and reliability on programs composed of opaque components, such as Unix shell commands. To reason about commands, these systems require partial specifications. However, creating such specifications is a manual, laborious, and error-prone process, limiting the practicality of these systems. This paper presents Caruca, a system for automatic specification mining for opaque commands. To overcome the challenge of language diversity across commands, Caruca first instruments a large language model to translate a command's user-facing documentation into a structured invocation syntax. Using this representation, Caruca explores the space of syntactically valid command invocations and execution environments. Caruca concretely executes each command-environment pair, interposing at the system-call and filesystem level to extract key command properties such as parallelizability and filesystem pre- and post-conditions. These properties can be exported in multiple specification formats and are immediately usable by existing systems. Applying Caruca across 60 GNU Coreutils, POSIX, and third-party commands across several specification-dependent systems shows that Caruca generates correct specifications for all but one case, completely eliminating manual effort from the process and currently powering the full specifications for a state-of-the-art static analysis tool.