Whole-Body Coordination for Dynamic Object Grasping with Legged Manipulators

📅 2025-08-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of whole-body coordinated dynamic grasping for quadrupedal robots in unstructured environments, this paper introduces DQ-Bench—the first systematic benchmark for dynamic grasping evaluation—and proposes DQ-Net, a teacher-student learning framework. DQ-Net jointly models static geometric and dynamic temporal features via dual-view representation, a grasp fusion module, and privileged information distillation, enabling closed-loop control from lightweight observations: target masks, depth maps, and proprioceptive states only. Experiments demonstrate that DQ-Net achieves significantly higher grasping success rates across diverse dynamic tasks compared to prior methods, while exhibiting superior responsiveness and cross-terrain robustness. This work establishes a reproducible evaluation standard and an efficient learning paradigm for embodied intelligent manipulation in dynamic scenarios.

Technology Category

Application Category

📝 Abstract
Quadrupedal robots with manipulators offer strong mobility and adaptability for grasping in unstructured, dynamic environments through coordinated whole-body control. However, existing research has predominantly focused on static-object grasping, neglecting the challenges posed by dynamic targets and thus limiting applicability in dynamic scenarios such as logistics sorting and human-robot collaboration. To address this, we introduce DQ-Bench, a new benchmark that systematically evaluates dynamic grasping across varying object motions, velocities, heights, object types, and terrain complexities, along with comprehensive evaluation metrics. Building upon this benchmark, we propose DQ-Net, a compact teacher-student framework designed to infer grasp configurations from limited perceptual cues. During training, the teacher network leverages privileged information to holistically model both the static geometric properties and dynamic motion characteristics of the target, and integrates a grasp fusion module to deliver robust guidance for motion planning. Concurrently, we design a lightweight student network that performs dual-viewpoint temporal modeling using only the target mask, depth map, and proprioceptive state, enabling closed-loop action outputs without reliance on privileged data. Extensive experiments on DQ-Bench demonstrate that DQ-Net achieves robust dynamic objects grasping across multiple task settings, substantially outperforming baseline methods in both success rate and responsiveness.
Problem

Research questions and friction points this paper is trying to address.

Dynamic object grasping with legged manipulators in unstructured environments
Lack of benchmarks for dynamic grasping across varied conditions
Need for robust perception and control in dynamic scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

DQ-Bench benchmark evaluates dynamic grasping comprehensively
DQ-Net uses teacher-student framework for grasp inference
Lightweight student network enables closed-loop action outputs
🔎 Similar Papers
No similar papers found.
Q
Qiwei Liang
Hong Kong University of Science and Technology (Guangzhou)
B
Boyang Cai
Shenzhen University
R
Rongyi He
Shenzhen University
H
Hui Li
Shenzhen University
Tao Teng
Tao Teng
Istituto Italiano di Tecnologia (IIT)
Robotics
Haihan Duan
Haihan Duan
Associate Professor, Shenzhen MSU-BIT University
MultimediaBlockchainHuman-Centered ComputingDecentralized AIMetaverse
Changxin Huang
Changxin Huang
Shenzhen University, Assistant Professor
RoboticsReinforcement learning
R
Runhao Zeng
Shenzhen MSU-BIT University