FEWT: Improving Humanoid Robot Perception with Frequency-Enhanced Wavelet-based Transformers

📅 2025-09-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the insufficient perceptual representation capability of humanoid robots in embodied intelligence. We propose the Frequency-Enhanced Wavelet Transformer (FEWT) framework, which jointly models time-frequency features: temporal multi-scale decomposition is achieved via Time-Series Discrete Wavelet Transform (TS-DWT), while frequency-enhanced efficient multi-scale attention (FE-EMA) and residual cross-scale information aggregation enable human-like motion modeling under imitation learning. Compared to the ACT baseline, FEWT improves task success rates by 30% in simulation and by 6–12% in real-world scenarios, significantly enhancing model robustness and action reproduction fidelity. The core contribution lies in the deep integration of wavelet analysis and Transformer architectures, enabling, for the first time, frequency-domain-guided dynamic multi-scale attention modeling.

Technology Category

Application Category

📝 Abstract
The embodied intelligence bridges the physical world and information space. As its typical physical embodiment, humanoid robots have shown great promise through robot learning algorithms in recent years. In this study, a hardware platform, including humanoid robot and exoskeleton-style teleoperation cabin, was developed to realize intuitive remote manipulation and efficient collection of anthropomorphic action data. To improve the perception representation of humanoid robot, an imitation learning framework, termed Frequency-Enhanced Wavelet-based Transformer (FEWT), was proposed, which consists of two primary modules: Frequency-Enhanced Efficient Multi-Scale Attention (FE-EMA) and Time-Series Discrete Wavelet Transform (TS-DWT). By combining multi-scale wavelet decomposition with the residual network, FE-EMA can dynamically fuse features from both time-domain and frequency-domain. This fusion is able to capture feature information across various scales effectively, thereby enhancing model robustness. Experimental performance demonstrates that FEWT improves the success rate of the state-of-the-art algorithm (Action Chunking with Transformers, ACT baseline) by up to 30% in simulation and by 6-12% in real-world.
Problem

Research questions and friction points this paper is trying to address.

Improving humanoid robot perception with frequency-enhanced transformers
Enhancing model robustness via multi-scale time-frequency feature fusion
Increasing imitation learning success rates in simulation and real-world
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency-Enhanced Wavelet-based Transformer framework
Multi-scale attention with time-frequency fusion
Wavelet decomposition with residual networks
🔎 Similar Papers
No similar papers found.
J
Jiaxin Huang
Jiangsu Provincial Key Laboratory of Food Advanced Manufacturing Equipment Technology, School of Mechanical Engineering, Jiangnan University, Wuxi 214122, China
Hanyu Liu
Hanyu Liu
Key Laboratory of Material Simulation Methods and Software of MOE, Jilin University
Computational scienceHigh pressure
Y
Yunsheng Ma
Jiangsu Provincial Key Laboratory of Food Advanced Manufacturing Equipment Technology, School of Mechanical Engineering, Jiangnan University, Wuxi 214122, China
J
Jian Shen
Jiangsu Provincial Key Laboratory of Food Advanced Manufacturing Equipment Technology, School of Mechanical Engineering, Jiangnan University, Wuxi 214122, China
Y
Yilin Zheng
Jiangsu Provincial Key Laboratory of Food Advanced Manufacturing Equipment Technology, School of Mechanical Engineering, Jiangnan University, Wuxi 214122, China
Jiayi Wen
Jiayi Wen
PhD in Mathematics, University of California, San Diego
Mathematical ModelingMonte Carlo SimulationsVariational AnalysisNumerical Computation
B
Baishu Wan
Jiangsu Provincial Key Laboratory of Food Advanced Manufacturing Equipment Technology, School of Mechanical Engineering, Jiangnan University, Wuxi 214122, China
P
Pan Li
Jiangsu Provincial Key Laboratory of Food Advanced Manufacturing Equipment Technology, School of Mechanical Engineering, Jiangnan University, Wuxi 214122, China
Z
Zhigong Song
Jiangsu Provincial Key Laboratory of Food Advanced Manufacturing Equipment Technology, School of Mechanical Engineering, Jiangnan University, Wuxi 214122, China