🤖 AI Summary
This work addresses the limited generalization and risk of destructive collisions in existing methods for manipulating articulated objects, often caused by complex interactions between end-effectors and handles. To enable safe and generalizable manipulation, the authors propose the GSAM framework, which integrates visual perception, commonsense reasoning, and constraint generation. Specifically, a vision-based perceptual module estimates kinematic parameters, while a chain-of-thought–finetuned vision-language model enhances commonsense reasoning. A large language model encodes structural, pose, and obstacle-avoidance knowledge into interaction constraints that guide trajectory and pose planning. Evaluated across five object categories, 50 tasks, and randomized initial configurations, GSAM achieves a 36.0% higher success rate and 3.1% lower standard deviation compared to the best baseline, demonstrating significantly improved generalization and safety.
📝 Abstract
Articulated object manipulation is a unique challenge for service robots. Existing methods employ end-to-end policy learning, visionmotion planning, and large-language/visual-language model (LLM/VLM), but often overlook the diversity of articulated objects and the complexity of interactions between end-effector and handle, leading to limited generalization and destructive collisions. To address this, we propose GSAM, a generalizable and safe robotic framework for articulated object manipulation. Specifically, a vision-based perceiver generates the kinematic parameters. Considering that pre-trained markers in perceiver yield raw estimations that may deviate from commonsense, we present a f ine-tuned VLM-based refiner, using chain-of-thought (COT) commonsense reasoning to refine perception. To prevent destructive collisions, we design an interaction constraint function generator, integrating articulated object, interaction pose, and obstacle avoidance knowledge into a base. LLM then functionalize these constraints and apply them to trajectory and posture planning. A kinematic-aware manipulation planner verifies reachability for trajectory and posture. Experiments on 50 hinge tasks across 5 object categories and 50 randomly initialized end-effectorhandle configurations show that GSAM reduces standard deviation by 3.1% and improves manipulation success rate by 36.0% compared to the best baseline, respectively demonstrating the superior object generalization and interaction safety of GSAM in practical scenarios.