Neurosymbolic Auditing of Natural-Language Software Requirements

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Natural language software requirements often introduce errors in formal modeling of safety-critical systems due to ambiguity, inconsistency, and underspecification. This work proposes a neurosymbolic approach that integrates large language models with SMT solvers to automatically generate diverse formal logical expressions from natural language requirements. Ambiguity is detected by measuring SMT inequivalence among stochastically generated variants, and verifiable test cases are constructed through bidirectional equivalence checking. The method employs fine-grained symbolic feedback—such as concrete counterexamples—to guide precise repair of problematic formulations. Evaluated on a benchmark of hemodialysis safety requirements and audited via VERIMED, the approach significantly improves formalization accuracy from 55.4% to 98.5%, substantially reducing the proportion of ambiguity-sensitive requirements and enabling highly reliable automated requirement auditing.

📝 Abstract

Natural-language software requirements are often ambiguous, inconsistent, and underspecified; in safety-critical domains, these defects propagate into formal models that verify the wrong specification and into implementations that ship unsafe behavior. We show that large language models, equipped with an SMT solver, can audit such requirements: translating them into formal logic, detecting ambiguity through stochastic variation in the generated formalization, and exposing inconsistency, vacuousness, and safety violations through solver queries on the resulting specification. We present VERIMED, a neurosymbolic pipeline that operationalizes this idea for medical-device software requirements, and report two findings. First, stochastic variation across independent formalizations is a signal of ambiguity: requirements that admit multiple plausible interpretations produce SMT-inequivalent formalizations, and bidirectional SMT equivalence checking turns this disagreement into a solver-checkable test. Second, the usefulness of symbolic feedback depends on its granularity: in counterexample-guided repair on a hemodialysis question-answering benchmark, concrete SMT counterexamples raise verified accuracy from 55.4% to 98.5%. Over an extensive experimental evaluation on open-source hemodialysis safety requirements, we show that the LLM-based approach in VERIMED successfully reduces ambiguity-sensitive requirements and enables rigorous auditing of software requirements through SMT-based queries.

Problem

Research questions and friction points this paper is trying to address.

natural-language requirements

ambiguity

inconsistency

underspecification

safety-critical systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

neurosymbolic

SMT solver

requirement ambiguity