GSAM: A Generalizable and Safe Robotic Framework for Articulated Object Manipulation

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

194K/year
🤖 AI Summary
This work addresses the limited generalization and risk of destructive collisions in existing methods for manipulating articulated objects, often caused by complex interactions between end-effectors and handles. To enable safe and generalizable manipulation, the authors propose the GSAM framework, which integrates visual perception, commonsense reasoning, and constraint generation. Specifically, a vision-based perceptual module estimates kinematic parameters, while a chain-of-thought–finetuned vision-language model enhances commonsense reasoning. A large language model encodes structural, pose, and obstacle-avoidance knowledge into interaction constraints that guide trajectory and pose planning. Evaluated across five object categories, 50 tasks, and randomized initial configurations, GSAM achieves a 36.0% higher success rate and 3.1% lower standard deviation compared to the best baseline, demonstrating significantly improved generalization and safety.
📝 Abstract
Articulated object manipulation is a unique challenge for service robots. Existing methods employ end-to-end policy learning, visionmotion planning, and large-language/visual-language model (LLM/VLM), but often overlook the diversity of articulated objects and the complexity of interactions between end-effector and handle, leading to limited generalization and destructive collisions. To address this, we propose GSAM, a generalizable and safe robotic framework for articulated object manipulation. Specifically, a vision-based perceiver generates the kinematic parameters. Considering that pre-trained markers in perceiver yield raw estimations that may deviate from commonsense, we present a f ine-tuned VLM-based refiner, using chain-of-thought (COT) commonsense reasoning to refine perception. To prevent destructive collisions, we design an interaction constraint function generator, integrating articulated object, interaction pose, and obstacle avoidance knowledge into a base. LLM then functionalize these constraints and apply them to trajectory and posture planning. A kinematic-aware manipulation planner verifies reachability for trajectory and posture. Experiments on 50 hinge tasks across 5 object categories and 50 randomly initialized end-effectorhandle configurations show that GSAM reduces standard deviation by 3.1% and improves manipulation success rate by 36.0% compared to the best baseline, respectively demonstrating the superior object generalization and interaction safety of GSAM in practical scenarios.
Problem

Research questions and friction points this paper is trying to address.

articulated object manipulation
generalization
destructive collisions
robotic safety
object diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalizable manipulation
Safe robotic interaction
Vision-language model (VLM)
Chain-of-thought reasoning
Kinematic-aware planning
B
Beichen Shao
College of Computer Science, Chongqing University, Chongqing, China
M
Mengying Xie
College of Computer Science, Chongqing University, Chongqing, China
Heng Su
Heng Su
Tsinghua University
super-resolutioncomputer visionimage processing
W
Wanyi Zhang
College of Computer Science, Chongqing University, Chongqing, China
M
Mingyan Li
College of Computer Science, Chongqing University, Chongqing, China
Y
Yan Ding
Lumos Robotics, China; Xi’an Jiaotong-Liverpool University, China; Fudan University, China
Fausto Giunchiglia
Fausto Giunchiglia
Professor of Computer Science, Università di Trento
Computational theories of the mind
Chao Chen
Chao Chen
PhD, Department of Electrical and Computer Engineering, University of Miami
Data MiningMultimedia Information Retrieval