RepV: Safety-Separable Latent Spaces for Scalable Neurosymbolic Plan Verification

📅 2025-10-30

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Verifying behavioral compliance of AI systems in safety-critical scenarios remains challenging due to the limited expressiveness of formal methods and the high false-positive/negative rates in natural-language-constraint evaluation. To address this, we propose RepV, a neuro-symbolic verifier that constructs a safety-separable latent space—enabling linear separation of safe versus unsafe plans—and provides position-based probabilistic verification guarantees. RepV integrates model-checking–generated initial labels, a lightweight projector, a frozen linear classifier, and LLM-generated reasoning traces in a synergistic inference framework. It achieves efficient, single-forward-pass verification with <0.2M parameters and zero human annotation. Experiments across multiple tasks demonstrate that RepV improves compliance prediction accuracy by up to 15% over state-of-the-art baselines, significantly outperforming conventional fine-tuning approaches.

Technology Category

Application Category

📝 Abstract

As AI systems migrate to safety-critical domains, verifying that their actions comply with well-defined rules remains a challenge. Formal methods provide provable guarantees but demand hand-crafted temporal-logic specifications, offering limited expressiveness and accessibility. Deep learning approaches enable evaluation of plans against natural-language constraints, yet their opaque decision process invites misclassifications with potentially severe consequences. We introduce RepV, a neurosymbolic verifier that unifies both views by learning a latent space where safe and unsafe plans are linearly separable. Starting from a modest seed set of plans labeled by an off-the-shelf model checker, RepV trains a lightweight projector that embeds each plan, together with a language model-generated rationale, into a low-dimensional space; a frozen linear boundary then verifies compliance for unseen natural-language rules in a single forward pass. Beyond binary classification, RepV provides a probabilistic guarantee on the likelihood of correct verification based on its position in the latent space. This guarantee enables a guarantee-driven refinement of the planner, improving rule compliance without human annotations. Empirical evaluations show that RepV improves compliance prediction accuracy by up to 15% compared to baseline methods while adding fewer than 0.2M parameters. Furthermore, our refinement framework outperforms ordinary fine-tuning baselines across various planning domains. These results show that safety-separable latent spaces offer a scalable, plug-and-play primitive for reliable neurosymbolic plan verification. Code and data are available at: https://repv-project.github.io/.

Problem

Research questions and friction points this paper is trying to address.

Verifying AI plan compliance with safety rules in critical domains

Overcoming limitations of formal methods and opaque deep learning

Creating linearly separable latent spaces for neurosymbolic verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns latent space for linear safety separation

Uses lightweight projector with language model rationales

Provides probabilistic guarantees for verification correctness

🔎 Similar Papers

No similar papers found.

Authors to Follow