Distracted Robot: How Visual Clutter Undermine Robotic Manipulation

📅 2025-11-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the impact of visual clutter on robotic manipulation performance of vision-language-action (VLA) models. Addressing the lack of quantifiable clutter modeling and systematic evaluation in prior work, we introduce the first psychophysics-inspired clutter metric—integrating distractor count, occlusion severity, and spatial distribution characteristics. We conduct unified benchmarking of mainstream VLA models across photorealistic simulation and physical robot platforms, revealing divergent vulnerability patterns and a consistent decline in task success under clutter. We further demonstrate that clutter degree significantly predicts performance degradation—up to 34%—and show that standard fine-tuning fails to uniformly mitigate diverse clutter effects. Our core contribution is the first interpretable, reproducible visual clutter assessment framework for VLA models, providing both theoretical grounding and empirical benchmarks to guide the design of robust multimodal robotic systems.

Technology Category

Application Category

📝 Abstract
In this work, we propose an evaluation protocol for examining the performance of robotic manipulation policies in cluttered scenes. Contrary to prior works, we approach evaluation from a psychophysical perspective, therefore we use a unified measure of clutter that accounts for environmental factors as well as the distractors quantity, characteristics, and arrangement. Using this measure, we systematically construct evaluation scenarios in both hyper-realistic simulation and real-world and conduct extensive experimentation on manipulation policies, in particular vision-language-action (VLA) models. Our experiments highlight the significant impact of scene clutter, lowering the performance of the policies, by as much as 34% and show that despite achieving similar average performance across the tasks, different VLA policies have unique vulnerabilities and a relatively low agreement on success scenarios. We further show that our clutter measure is an effective indicator of performance degradation and analyze the impact of distractors in terms of their quantity and occluding influence. At the end, we show that finetuning on enhanced data, although effective, does not equally remedy all negative impacts of clutter on performance.
Problem

Research questions and friction points this paper is trying to address.

Evaluates robotic manipulation in cluttered scenes using psychophysical measures
Analyzes how visual clutter degrades performance of vision-language-action models
Shows fine-tuning partially mitigates but does not fully solve clutter issues
Innovation

Methods, ideas, or system contributions that make the work stand out.

Psychophysical clutter measure for robotic evaluation
Systematic scenario construction in simulation and reality
Clutter measure indicates performance degradation effectively
🔎 Similar Papers
No similar papers found.
Amir Rasouli
Amir Rasouli
Noah's Ark Laboratory
RoboticsComputer VisionAutonomous DrivingVisual Attention
M
Montgomery Alban
Huawei Technologies Canada
S
Sajjad Pakdamansavoji
Z
Zhiyuan Li
Z
Zhanguang Zhang
A
Aaron Wu
Xuan Zhao
Xuan Zhao
PhD, Forschungszentrum Jülich GmbH
XAIFair AI