🤖 AI Summary
This work addresses the challenges of exposure bias and efficiency bottlenecks in long-horizon robotic task planning under complex logical constraints. The authors propose a two-level optimization framework grounded in instruction learning: an upper-level neural scorer learns to assess object importance, while a lower-level symbolic planner solves the planning problem within a pruned search space. To stabilize training and provide reliable feedback, they introduce a novel 3R recovery mechanism that integrates Repair, Restart, and Rollback strategies. Evaluated on three benchmarks, the method achieves state-of-the-art performance, reducing failure rates by 80.04% and planning time by 57.14%. Its effectiveness is further validated in both simulation and real-world experiments on a quadrupedal mobile manipulation platform.
📝 Abstract
Task planning often suffers from severe efficiency bottlenecks when robots must reason over long-horizon action sequences under complex logical constraints, including object affordances, spatial relationships, and sequential action dependencies. Recent neuro-symbolic methods improve planning efficiency by learning object-importance scores to prune task-irrelevant objects, but they typically rely on fixed offline supervision generated from full search spaces. This creates a train-test mismatch: at deployment, the planner operates in pruned search spaces induced by the model's own imperfect predictions, leading to exposure bias and degraded planning performance. To address this challenge, we formulate object-importance learning for task planning as an imperative learning-based bilevel optimization problem. The upper level optimizes a neural scorer, while the lower level solves a symbolic planning problem in the score-pruned search space. To stabilize this learning process, we introduce a 3R strategy into the lower-level planning, using parallel Repair, Restart, and Rollback recovery to provide reliable and adaptive feedback for upper-level learning. Experiments on three challenging benchmarks demonstrate state-of-the-art performance, including an 80.04% reduction in failure rate and a 57.14% reduction in planning time. We further validate the framework on a quadruped-based mobile manipulator in simulation and the real world, demonstrating its potential for efficient and deployable neuro-symbolic task planning.