UniStateDLO: Unified Generative State Estimation and Tracking of Deformable Linear Objects Under Occlusion for Constrained Manipulation

📅 2025-12-19

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

Deformable linear objects (DLOs) in constrained environments suffer from poor perception robustness due to severe occlusion, high-dimensional state spaces, textureless surfaces, and sensor noise. Method: We propose the first end-to-end unified generative framework that jointly models single-frame DLO state estimation and inter-frame tracking as a conditional diffusion generation task. Our approach employs a partial point cloud encoder and a jointly optimized network, trained exclusively on synthetic data to achieve zero-shot sim-to-real generalization. It uniformly handles initial occlusion, self-occlusion, and multi-object occlusion. Contribution/Results: The method outperforms state-of-the-art approaches comprehensively in both simulation and real-world settings. It delivers real-time, globally smooth yet locally accurate DLO state predictions, and successfully enables stable closed-loop manipulation in complex 3D constrained environments—without requiring real-world fine-tuning or domain-specific heuristics.

Technology Category

Application Category

📝 Abstract

Perception of deformable linear objects (DLOs), such as cables, ropes, and wires, is the cornerstone for successful downstream manipulation. Although vision-based methods have been extensively explored, they remain highly vulnerable to occlusions that commonly arise in constrained manipulation environments due to surrounding obstacles, large and varying deformations, and limited viewpoints. Moreover, the high dimensionality of the state space, the lack of distinctive visual features, and the presence of sensor noises further compound the challenges of reliable DLO perception. To address these open issues, this paper presents UniStateDLO, the first complete DLO perception pipeline with deep-learning methods that achieves robust performance under severe occlusion, covering both single-frame state estimation and cross-frame state tracking from partial point clouds. Both tasks are formulated as conditional generative problems, leveraging the strong capability of diffusion models to capture the complex mapping between highly partial observations and high-dimensional DLO states. UniStateDLO effectively handles a wide range of occlusion patterns, including initial occlusion, self-occlusion, and occlusion caused by multiple objects. In addition, it exhibits strong data efficiency as the entire network is trained solely on a large-scale synthetic dataset, enabling zero-shot sim-to-real generalization without any real-world training data. Comprehensive simulation and real-world experiments demonstrate that UniStateDLO outperforms all state-of-the-art baselines in both estimation and tracking, producing globally smooth yet locally precise DLO state predictions in real time, even under substantial occlusions. Its integration as the front-end module in a closed-loop DLO manipulation system further demonstrates its ability to support stable feedback control in complex, constrained 3-D environments.

Problem

Research questions and friction points this paper is trying to address.

Robust perception of deformable linear objects under occlusion

Unified generative state estimation and tracking from partial point clouds

Zero-shot sim-to-real generalization without real-world training data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified generative pipeline for DLO state estimation and tracking

Diffusion models map partial observations to high-dimensional DLO states

Zero-shot sim-to-real generalization using synthetic dataset training

🔎 Similar Papers

Shape-Space Deformer: Unified Visuo-Tactile Representations for Robotic Manipulation of Deformable Objects