Constrained Decoding for Robotics Foundation Models

📅 2025-09-01

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Robot foundation models often generate actions that violate behavioral correctness and safety constraints, compromising operational safety and logical consistency. Method: We propose the first constraint-decoding framework tailored for robot foundation models, employing Signal Temporal Logic (STL) as a formal specification language to perform real-time verification and correction of action sequences during decoding—without requiring model retraining. The framework supports multimodal inputs, conditional action generation, and dynamic intervention, and is compatible with mainstream navigation foundation models. Contribution/Results: Experiments demonstrate that our approach effectively filters unsafe actions, improves task success rates, and enhances cross-scenario generalization. To the best of our knowledge, this is the first work to enable plug-and-play integration of STL constraints into the inference pipeline of robot foundation models, thereby ensuring safety and logical fidelity in open-world robotic deployment.

Technology Category

Application Category

📝 Abstract

Recent advances in the development of robotic foundation models have led to promising end-to-end and general-purpose capabilities in robotic systems. These models are pretrained on vast datasets of robot trajectories to process multi- modal inputs and directly output a sequence of action that the system then executes in the real world. Although this approach is attractive from the perspective of im- proved generalization across diverse tasks, these models are still data-driven and, therefore, lack explicit notions of behavioral correctness and safety constraints. We address these limitations by introducing a constrained decoding framework for robotics foundation models that enforces logical constraints on action trajec- tories in dynamical systems. Our method ensures that generated actions provably satisfy signal temporal logic (STL) specifications at runtime without retraining, while remaining agnostic of the underlying foundation model. We perform com- prehensive evaluation of our approach across state-of-the-art navigation founda- tion models and we show that our decoding-time interventions are useful not only for filtering unsafe actions but also for conditional action-generation. Videos available on our website: https://constrained-robot-fms.github.io

Problem

Research questions and friction points this paper is trying to address.

Enforcing safety constraints on robot action trajectories

Ensuring actions satisfy signal temporal logic specifications

Providing runtime safety without retraining foundation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Constrained decoding enforces logical action constraints

Ensures runtime STL specification satisfaction without retraining

Agnostic framework works with various foundation models

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey