Closed-Loop Neural Activation Control in Vision-Language-Action Models

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work addresses the limitations of existing vision-language-action (VLA) models, which employ fixed intervention coefficients for neural steering at test time—a form of open-loop control that fails to adapt to dynamic shifts in task states and conceptual errors, often leading to over-correction, oscillation, and reduced success rates. To overcome this, the authors propose CTRL-STEER, a novel framework that introduces closed-loop control into VLA neural intervention without modifying or retraining the underlying model. By decoupling representation and modulation, CTRL-STEER dynamically adjusts intervention strength along residual directions aligned with motion. The framework integrates PID and reinforcement learning controllers to enable temporally adaptive concept steering. Experiments across four LIBERO benchmark suites demonstrate that CTRL-STEER significantly improves the trade-off between steering stability and task success compared to fixed-coefficient baselines.

📝 Abstract

Vision-Language-Action (VLA) models can be steered at test time by intervening on semantically meaningful internal directions, but existing methods use a fixed steering coefficient, effectively operating in open loop. This is poorly suited to embodied control, where task state and concept error evolve over time, often causing overcorrection, oscillation, and reduced task success, especially for temporal behaviors such as speed and smoothness. We propose CTRL-STEER, a closed-loop framework that replaces static intervention strength with adaptive, time-varying control signals. The key idea is to decouple representation from regulation: rather than assuming temporal concepts are directly controlled by individual neurons, we steer along motion-aligned residual directions while a feedback controller adjusts intervention magnitude online. We instantiate this framework with both PID and reinforcement learning based controllers. Experiments with a fine-tuned OpenVLA policy on four LIBERO task suites show that CTRL-STEER achieves more stable concept regulation and a better steering-task success trade-off than fixed-coefficient baselines, without modifying or retraining the base model.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action models

closed-loop control

concept regulation

embodied control

temporal behaviors

Innovation

Methods, ideas, or system contributions that make the work stand out.

closed-loop control

neural activation steering

vision-language-action models

adaptive intervention

feedback regulation

🔎 Similar Papers

Modelling Multimodal Integration in Human Concept Processing with Vision-Language Models

2024-07-25Citations: 0

Interpreting Neurons in Deep Vision Networks with Language Models

2024-03-20Citations: 5