Bridging Semantics and Physical Execution: A Neuro-Symbolic Framework for Multi-Pair Robotic Assembly

📅 2026-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of robotic multi-part assembly in unstructured environments, where spatial interference and contact uncertainty hinder existing approaches that struggle to jointly reason about perception, decision-making, and physical execution. The authors propose an end-to-end neuro-symbolic framework that leverages semantic instance recognition and large language model guidance to hierarchically generate optimal subassembly graphs for each part pair, decoupling general strategies from edge cases. By integrating a lightweight discriminator, topology-consistent global sequence synthesis, and a dynamic behavior tree embedded with atomic skills, the system enables force-aware closed-loop execution. This approach mitigates logical hallucination and state-space explosion while supporting scalable and verifiable complex assembly planning. Evaluated on 100 real-world scenarios, it achieves a 97.00% offline global executability rate; on a UR3 robot under strong disturbances, it attains a 90% success rate with a 0.5 mm assembly tolerance.

📝 Abstract

Multi-pair robotic assembly in unstructured environments faces spatial interference and contact uncertainties. Existing paradigms fail to bridge cognitive decision-making and physical execution, as they either encounter state-space explosion and knowledge bottlenecks or suffer from logical hallucinations and topological conflicts. We propose an end-to-end neuro-symbolic framework that solves the challenge hierarchically: generating optimal subgraphs for each pair, decoupling generality from edge cases, and then resolving cross-pair interferences. Given an eye-on-hand RGB-D assembly scene, the framework extracts semantic instance identity and state while quantifying the scene for divergence calculation. For each pair, optimal subgraph is generated via LLM using barely basic actions to mitigate hallucinations. Supportive actions for edge cases are reasoned and inserted with a lightweight discriminator. Driven by the divergence between the quantified baseline and current scene, it is easily extensible at low cost. Augmented subgraphs are topologically coordinated into global sequences while preserving internal behavioral coherence. Dynamic behavior trees embedding atomic skills close the force-aware execution loop. Offline evaluation on 100 real-world scenes achieves 97.00% global executability, outperforming classical and state-of-the-art planners. Real-robot deployment on a UR3 arm attains 90% success rate with 0.5 mm tolerance under strong interference, demonstrating a unified and verifiable solution for complex autonomous assembly.

Problem

Research questions and friction points this paper is trying to address.

multi-pair robotic assembly

spatial interference

contact uncertainties

cognitive-physical bridging

unstructured environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

neuro-symbolic

multi-pair assembly

logical hallucination mitigation