🤖 AI Summary
This work addresses the challenge of robotic multi-part assembly in unstructured environments, where spatial interference and contact uncertainty hinder existing approaches that struggle to jointly reason about perception, decision-making, and physical execution. The authors propose an end-to-end neuro-symbolic framework that leverages semantic instance recognition and large language model guidance to hierarchically generate optimal subassembly graphs for each part pair, decoupling general strategies from edge cases. By integrating a lightweight discriminator, topology-consistent global sequence synthesis, and a dynamic behavior tree embedded with atomic skills, the system enables force-aware closed-loop execution. This approach mitigates logical hallucination and state-space explosion while supporting scalable and verifiable complex assembly planning. Evaluated on 100 real-world scenarios, it achieves a 97.00% offline global executability rate; on a UR3 robot under strong disturbances, it attains a 90% success rate with a 0.5 mm assembly tolerance.
📝 Abstract
Multi-pair robotic assembly in unstructured environments faces spatial interference and contact uncertainties. Existing paradigms fail to bridge cognitive decision-making and physical execution, as they either encounter state-space explosion and knowledge bottlenecks or suffer from logical hallucinations and topological conflicts. We propose an end-to-end neuro-symbolic framework that solves the challenge hierarchically: generating optimal subgraphs for each pair, decoupling generality from edge cases, and then resolving cross-pair interferences. Given an eye-on-hand RGB-D assembly scene, the framework extracts semantic instance identity and state while quantifying the scene for divergence calculation. For each pair, optimal subgraph is generated via LLM using barely basic actions to mitigate hallucinations. Supportive actions for edge cases are reasoned and inserted with a lightweight discriminator. Driven by the divergence between the quantified baseline and current scene, it is easily extensible at low cost. Augmented subgraphs are topologically coordinated into global sequences while preserving internal behavioral coherence. Dynamic behavior trees embedding atomic skills close the force-aware execution loop. Offline evaluation on 100 real-world scenes achieves 97.00% global executability, outperforming classical and state-of-the-art planners. Real-robot deployment on a UR3 arm attains 90% success rate with 0.5 mm tolerance under strong interference, demonstrating a unified and verifiable solution for complex autonomous assembly.