🤖 AI Summary
Existing robotic foundation models lack formal guarantees for temporal specifications—such as time-bounded goal reaching, task sequencing, and persistent safety conditions—when executing natural language instructions. This work proposes a specification-aware action distribution optimization framework that, without fine-tuning the pretrained model parameters, enforces hard constraints expressed in Signal Temporal Logic (STL) during inference. At each decision step, the method computes a minimally corrected action distribution by propagating forward dynamics to ensure compliance with STL specifications. To the best of our knowledge, this is the first approach to embed STL hard constraints directly into the inference process of foundation models, enabling zero-training enforcement of complex spatiotemporal requirements. Experiments across multiple simulated environments demonstrate that the framework successfully satisfies diverse STL specifications while preserving the original model’s task performance.
📝 Abstract
Robotics foundation models have demonstrated strong capabilities in executing natural language instructions across diverse tasks and environments. However, they remain largely data-driven and lack formal guarantees on safety and satisfaction of time-dependent specifications during deployment. In practice, robots often need to comply with operational constraints involving rich spatio-temporal requirements such as time-bounded goal visits, sequential objectives, and persistent safety conditions. In this work, we propose a specification-aware action distribution optimization framework that enforces a broad class of Signal Temporal Logic (STL) constraints during execution of a pretrained robotics foundation model without modifying its parameters. At each decision step, the method computes a minimally modified action distribution that satisfies a hard STL feasibility constraint by reasoning over the remaining horizon using forward dynamics propagation. We validate the proposed framework in simulation using a state-of-the-art robotics foundation model across multiple environments and complex specifications.