High-Precision Transformer-Based Visual Servoing for Humanoid Robots in Aligning Tiny Objects

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of high-precision visual alignment of small objects (e.g., screws) for humanoid robots, this paper proposes a dual-view closed-loop visual servoing framework. It fuses images from head and torso-mounted cameras with head joint angles, employing a Transformer architecture augmented with a distance estimation module and a multi-perception head to enable collaborative modeling of heterogeneous sensory features. The method achieves, for the first time in close-range scenarios, real-time pose estimation and servo control with sub-millimeter accuracy (0.8–1.3 mm), attaining success rates of 93%–100% on M4–M8 screw manipulation tasks—significantly outperforming conventional approaches. Key innovations include joint-state-embedded cross-view fusion of vision and proprioception, and a distance-sensitive visual servoing paradigm explicitly optimized for micro-manipulation.

Technology Category

Application Category

📝 Abstract
High-precision tiny object alignment remains a common and critical challenge for humanoid robots in real-world. To address this problem, this paper proposes a vision-based framework for precisely estimating and controlling the relative position between a handheld tool and a target object for humanoid robots, e.g., a screwdriver tip and a screw head slot. By fusing images from the head and torso cameras on a robot with its head joint angles, the proposed Transformer-based visual servoing method can correct the handheld tool's positional errors effectively, especially at a close distance. Experiments on M4-M8 screws demonstrate an average convergence error of 0.8-1.3 mm and a success rate of 93%-100%. Through comparative analysis, the results validate that this capability of high-precision tiny object alignment is enabled by the Distance Estimation Transformer architecture and the Multi-Perception-Head mechanism proposed in this paper.
Problem

Research questions and friction points this paper is trying to address.

High-precision alignment of tiny objects for humanoid robots
Vision-based framework for tool-object position estimation
Transformer-based method for correcting positional errors effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based visual servoing for precision alignment
Fuses head and torso camera images with joint angles
Distance Estimation Transformer and Multi-Perception-Head mechanism
🔎 Similar Papers
No similar papers found.
J
Jialong Xue
Institute of Humanoid Robots, Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China, Hefei, Anhui 230026, China
W
Wei Gao
Institute of Humanoid Robots, Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China, Hefei, Anhui 230026, China
Y
Yu Wang
Institute of Humanoid Robots, Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China, Hefei, Anhui 230026, China
C
Chao Ji
Institute of Humanoid Robots, Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China, Hefei, Anhui 230026, China
Dongdong Zhao
Dongdong Zhao
School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu 730000, China
Shi Yan
Shi Yan
Eindhoven University of Technology
Optical communicationfiber opticsSignal processing
Shiwu Zhang
Shiwu Zhang
University of Science and Technology of China
RoboticsSmart MaterialsTerradynamics