🤖 AI Summary
Addressing the challenges of distinguishing visually similar electromyographic (sEMG) gesture signals and deploying models on resource-constrained embedded devices, this paper proposes a lightweight Wavelet-Transformer architecture. Our method introduces a learnable wavelet transform to jointly capture time-frequency characteristics and designs a novel WaveletConv module that integrates multi-level wavelet decomposition with depthwise separable convolution, achieving an optimal trade-off between accuracy and computational efficiency. The model contains only 3.1 million parameters and achieves 95.0% classification accuracy on the EPN612 dataset. After INT8 quantization, it attains an inference latency of just 6.75 ms—meeting stringent real-time requirements for edge deployment. To the best of our knowledge, this is the first work to synergistically combine learnable wavelet transforms with a lightweight Transformer for sEMG gesture recognition, significantly enhancing fine-grained gesture discrimination capability and practical feasibility for on-device inference.
📝 Abstract
Human-machine interaction, particularly in prosthetic and robotic control, has seen progress with gesture recognition via surface electromyographic (sEMG) signals.However, classifying similar gestures that produce nearly identical muscle signals remains a challenge, often reducing classification accuracy. Traditional deep learning models for sEMG gesture recognition are large and computationally expensive, limiting their deployment on resource-constrained embedded systems. In this work, we propose WaveFormer, a lightweight transformer-based architecture tailored for sEMG gesture recognition. Our model integrates time-domain and frequency-domain features through a novel learnable wavelet transform, enhancing feature extraction. In particular, the WaveletConv module, a multi-level wavelet decomposition layer with depthwise separable convolution, ensures both efficiency and compactness. With just 3.1 million parameters, WaveFormer achieves 95% classification accuracy on the EPN612 dataset, outperforming larger models. Furthermore, when profiled on a laptop equipped with an Intel CPU, INT8 quantization achieves real-time deployment with a 6.75 ms inference latency.