🤖 AI Summary
Existing Vision-Language-Action (VLA) models rely on static viewpoints and shared visual encoders, resulting in weak 3D perception and cross-task representation interference, thereby limiting robustness and generalization. To address this, we propose a task-aware view planning framework that actively selects optimal observation viewpoints and couples them with a Mixture-of-Experts (MoE) visual encoder to decouple task-specific features. We further introduce a pseudo-environment to accelerate view-policy training, enabling dynamic and discriminative visual representation learning. Evaluated on the RLBench multi-task manipulation benchmark, our method significantly outperforms fixed-view baselines: it improves both action prediction accuracy and task success rate. These results demonstrate superior generalization to complex manipulation tasks and practical efficacy in real-world robotic settings.
📝 Abstract
Recent vision-language-action (VLA) models for multi-task robotic manipulation commonly rely on static viewpoints and shared visual encoders, which limit 3D perception and cause task interference, hindering robustness and generalization. In this work, we propose Task-Aware View Planning (TAVP), a framework designed to overcome these challenges by integrating active view planning with task-specific representation learning. TAVP employs an efficient exploration policy, accelerated by a novel pseudo-environment, to actively acquire informative views. Furthermore, we introduce a Mixture-of-Experts (MoE) visual encoder to disentangle features across different tasks, boosting both representation fidelity and task generalization. By learning to see the world in a task-aware way, TAVP generates more complete and discriminative visual representations, demonstrating significantly enhanced action prediction across a wide array of manipulation challenges. Extensive experiments on RLBench tasks show that our proposed TAVP model achieves superior performance over state-of-the-art fixed-view approaches. Visual results and code are provided at: https://hcplab-sysu.github.io/TAVP.