PASG: A Closed-Loop Framework for Automated Geometric Primitive Extraction and Semantic Anchoring in Robotic Manipulation

📅 2025-08-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the semantic-geometric disconnect between high-level task semantics and low-level geometric features in robotic manipulation, this paper proposes a closed-loop spatial-semantic joint reasoning framework. Methodologically, it integrates automatic geometric primitive extraction, fine-tuning of Qwen2.5VL-PA, semantic grounding, and a closed-loop feedback mechanism to enable annotation-free dynamic semantic anchoring. We further introduce the first affordance-aware spatial-semantic joint benchmark, supporting cross-category keypoint and axis detection as well as fine-grained semantic–functional relationship modeling. Experiments demonstrate that our approach achieves performance on par with human-annotated baselines across diverse real-world manipulation tasks, significantly reducing annotation dependency while enhancing robots’ autonomous understanding of object functional properties and task objectives.

Technology Category

Application Category

📝 Abstract
The fragmentation between high-level task semantics and low-level geometric features remains a persistent challenge in robotic manipulation. While vision-language models (VLMs) have shown promise in generating affordance-aware visual representations, the lack of semantic grounding in canonical spaces and reliance on manual annotations severely limit their ability to capture dynamic semantic-affordance relationships. To address these, we propose Primitive-Aware Semantic Grounding (PASG), a closed-loop framework that introduces: (1) Automatic primitive extraction through geometric feature aggregation, enabling cross-category detection of keypoints and axes; (2) VLM-driven semantic anchoring that dynamically couples geometric primitives with functional affordances and task-relevant description; (3) A spatial-semantic reasoning benchmark and a fine-tuned VLM (Qwen2.5VL-PA). We demonstrate PASG's effectiveness in practical robotic manipulation tasks across diverse scenarios, achieving performance comparable to manual annotations. PASG achieves a finer-grained semantic-affordance understanding of objects, establishing a unified paradigm for bridging geometric primitives with task semantics in robotic manipulation.
Problem

Research questions and friction points this paper is trying to address.

Bridging high-level task semantics and low-level geometric features in robotics
Lack of semantic grounding in canonical spaces for vision-language models
Dynamic semantic-affordance relationships in robotic manipulation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatic primitive extraction via geometric feature aggregation
VLM-driven semantic anchoring for dynamic affordance coupling
Spatial-semantic reasoning benchmark with fine-tuned VLM
🔎 Similar Papers
No similar papers found.
Zhihao Zhu
Zhihao Zhu
University of Science and Technology of China
Machine Learning PrivacyRecommender SystemGraph Neural Network
Y
Yifan Zheng
MoE key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University
S
Siyu Pan
MoE key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University
Yaohui Jin
Yaohui Jin
Shanghai Jiao Tong University
Y
Yao Mu
MoE key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University