Uncovering Vulnerability of Vision-Language-Action Models under Joint-Level Physical Faults

📅 2026-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the lack of robustness in vision-language-action (VLA) models when deployed on real robots under joint-level physical faults—such as actuator degradation or increased friction—which induce action deviations and task failures. The study is the first to systematically characterize the heterogeneous performance degradation caused by such faults and introduces J-PARC, a lightweight online calibration framework. Without retraining or modifying the original frozen VLA policy, J-PARC implicitly infers the fault state by modeling recent joint dynamics and generates residual action corrections. This approach significantly improves task success rates across diverse joint faults while preserving baseline performance in fault-free conditions, thereby achieving both robustness and generalization.

📝 Abstract

Deploying Vision-Language-Action (VLA) models in real robotic systems requires robustness not only to semantic and perceptual variations, but also to embodiment-side faults that change how actions are physically realized. Real robots can experience joint-level changes caused by actuator degradation, hardware faults, safety limits, collision damage, or wear-induced friction. These faults are critical because they alter the action-to-motion interface of a policy, disrupting the learned closed-loop relationship between commanded actions, realized motion, and subsequent observations. In this work, we study realistic joint-level physical faults and show that VLA models are vulnerable when predicted actions are executed through a perturbed robot body. Our analysis reveals joint-dependent effects, with heterogeneous degradation in task success across affected joints. We also show that performance drops cannot be attributed solely to physical infeasibility, since feasible faults such as increased joint friction can still substantially reduce success rates and induce closed-loop execution mismatch. Motivated by these findings, we propose Joint-level Physical-fault Aware Residual Calibrator (J-PARC), a lightweight residual calibration framework built on top of a frozen VLA policy. J-PARC infers a latent joint-fault regime from recent joint dynamics and conditions a shared residual calibrator on this regime, enabling adaptive action correction across faulty joints. Experiments show that J-PARC improves robustness under joint-level faults while preserving fault-free environment performance.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action models

physical faults

joint-level faults

robotic robustness

action-to-motion mismatch

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language-Action Models

Joint-Level Physical Faults

Residual Calibration