🤖 AI Summary
This work addresses the challenge in robotic manipulation where the tight coupling between grasp selection and motion planning complicates failure attribution and leads to inefficient trial-and-error learning. To overcome this, the authors propose GTP-FA, a two-stage framework that first generates grasp candidates and then performs task-oriented motion planning conditioned on the selected grasp. The key innovation lies in a generalizable failure attribution model capable of diagnosing failure modes for unseen grasps. This model informs a grasp-scoring function that integrates task priors with risk-aware penalties, enabling targeted optimization of high-risk initial states during planning. By unifying grasp generation, conditional motion planning, failure attribution modeling, and vision-language-action representations, the framework significantly improves task success rates across multiple baseline strategies in both simulation and real-world robotic experiments, demonstrating its effectiveness and generalizability.
📝 Abstract
In robotic manipulation, the tight coupling between grasping and motion planning often obscures the true source of failure, leading to inefficient trial-and-error. To enable efficient long-horizon manipulation, we propose GTP-FA (Grasp-Then-Plan with Failure Attribution), a task-oriented two-stage grasp-then-plan framework that generates grasp candidates and performs downstream motion planning conditioned on the selected grasp. Given a failed manipulation trajectory, we learn a failure attribution model that generalizes to unseen grasps and produces a stable distribution over failure modes for diagnosis-guided optimization. Based on these attribution results, we then optimize both modules in a diagnosis-driven manner: on the grasping side, we inject task-level priors and risk penalties into grasp candidate scoring and optimization to suppress unstable or task-incompatible grasps; on the planning side, we target high-risk initial states through data collection and fine-tuning to address genuine planning bottlenecks. We evaluate the proposed framework in both simulation and real-robot experiments, and show that GTP-FA improves the corresponding base learners across RL, IL, diffusion-policy, and VLA-based settings, achieving substantially higher overall task success rates.