VASO: Formally Verifiable Self-Evolving Skills for Physical AI Agents

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Existing approaches to robotic skill evolution lack formal guarantees of temporal safety in untested scenarios. This work proposes a closed-loop framework that encodes large language model–generated skills as semantic contracts and drives their self-evolution through formal verification. The method establishes, for the first time, a closed-loop optimization between reusable skill contracts and formal verification: it employs model checking with temporal logic specifications to generate counterexample trajectories, which are then used to construct textual gradients for refining the contracts—without modifying the underlying model weights. Experiments on Clearpath Jackal and PX4 quadrotor platforms demonstrate that the approach achieves a 97.2% compliance rate with formal specifications using fewer than 100 optimization samples, significantly outperforming current baselines.

📝 Abstract

Reusable robot skills are becoming the basic units through which embodied agents turn open-ended instructions into long-horizon physical behavior. We argue that, while foundation models have collapsed the cost of creating these skills, the cost of trusting them has not. Existing skill-evolution loops refine skills through execution feedback, unit tests, environment reward, or LLM self-critique, but these signals provide only trace-level evidence: they show that a skill worked on sampled executions, not that skill-induced plans satisfy temporal safety contracts under untested conditions. We introduce VASO, a framework for verification-guided self-evolution of LLM-generated robot skill contracts. In VASO, each skill is represented as a semantic contract with two coupled interfaces: a formal interface that aligns robot states, observations, and control commands with logical propositions for model checking, and a planner-facing interface that guides executable behavior generation. A model checker first filters logically inconsistent skill contracts, then verifies plans induced by the skill against global and local temporal specifications. When verification fails, VASO translates the counterexample trace into a textual gradient that updates the reusable skill contract while keeping foundation-model weights frozen. On Clearpath Jackal and PX4 quadcopter tasks, VASO reaches 97.2% formal-specification compliance using fewer than 100 optimization samples, outperforming execution-feedback, prompt-optimization, and fine-tuning baselines. To our knowledge, VASO is the first framework that closes the loop between formal verification and self-evolving LLM-generated skills for physical AI agents: formal counterexamples become optimization feedback for reusable robot skill contracts, rather than merely verifying one-off plans, tuning planner prompts, or fine-tuning model weights.

Problem

Research questions and friction points this paper is trying to address.

formal verification

robot skills

temporal safety contracts

self-evolving AI

physical AI agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

formal verification

self-evolving skills

semantic contracts