PathFormer: A Transformer with 3D Grid Constraints for Digital Twin Robot-Arm Trajectory Generation

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing sequence models neglect the 3D spatiotemporal structure of robotic motion (i.e., *where*, *what*, and *when*), often yielding invalid or suboptimal trajectories. To address this, we propose a digital-twin-oriented Transformer model with 3D grid constraints. Methodologically, we introduce a novel tri-dimensional grid representation coupled with a constraint-aware masked decoding mechanism, integrating task graph reasoning and structured path modeling to ensure spatial continuity, geometric feasibility, and task adaptability—while enabling local relocalization. Technically, the framework unifies 3D grid encoding, graph-enhanced task understanding, and simulation-to-reality transfer. Experiments on the xArm Lite 6 platform demonstrate a 97.5% reaching success rate, 92.5% grasping success rate, and 86.7% end-to-end success across 60 diverse language instructions; moreover, 99.99% of generated trajectories inherently satisfy kinematic and environmental constraints.

Technology Category

Application Category

📝 Abstract
Robotic arms require precise, task-aware trajectory planning, yet sequence models that ignore motion structure often yield invalid or inefficient executions. We present a Path-based Transformer that encodes robot motion with a 3-grid (where/what/when) representation and constraint-masked decoding, enforcing lattice-adjacent moves and workspace bounds while reasoning over task graphs and action order. Trained on 53,755 trajectories (80% train / 20% validation), the model aligns closely with ground truth -- 89.44% stepwise accuracy, 93.32% precision, 89.44% recall, and 90.40% F1 -- with 99.99% of paths legal by construction. Compiled to motor primitives on an xArm Lite 6 with a depth-camera digital twin, it attains up to 97.5% reach and 92.5% pick success in controlled tests, and 86.7% end-to-end success across 60 language-specified tasks in cluttered scenes, absorbing slips and occlusions via local re-grounding without global re-planning. These results show that path-structured representations enable Transformers to generate accurate, reliable, and interpretable robot trajectories, bridging graph-based planning and sequence-based learning and providing a practical foundation for general-purpose manipulation and sim-to-real transfer.
Problem

Research questions and friction points this paper is trying to address.

Generating precise robot-arm trajectories using structured 3D grid representations
Enforcing motion constraints and workspace bounds during trajectory planning
Bridging graph-based planning with sequence learning for reliable manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer with 3D grid constraints
Constraint-masked decoding for valid motions
Path-structured representation bridging planning and learning
🔎 Similar Papers
No similar papers found.