Optimal Scalogram for Computational Complexity Reduction in Acoustic Recognition Using Deep Learning

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational complexity of the continuous wavelet transform (CWT) in acoustic recognition—particularly limiting its applicability to non-stationary audio signals—this paper proposes a learnable scalogram-structure optimization framework. It jointly optimizes wavelet kernel length and scalogram scale step via a differentiable parametric scaling strategy, enabling end-to-end lightweight CWT feature extraction. The method preserves CNN-based classification robustness while substantially reducing computational overhead: across multiple acoustic recognition tasks, it achieves an average 47% reduction in FLOPs, incurs <0.3% accuracy degradation, and accelerates inference by 2.1×. Its core contribution lies in the first formulation of scalogram structure as learnable parameters—breaking away from conventional fixed-scale designs—and establishing a new paradigm for efficient time-frequency feature extraction.

Technology Category

Application Category

📝 Abstract
The Continuous Wavelet Transform (CWT) is an effective tool for feature extraction in acoustic recognition using Convolutional Neural Networks (CNNs), particularly when applied to non-stationary audio. However, its high computational cost poses a significant challenge, often leading researchers to prefer alternative methods such as the Short-Time Fourier Transform (STFT). To address this issue, this paper proposes a method to reduce the computational complexity of CWT by optimizing the length of the wavelet kernel and the hop size of the output scalogram. Experimental results demonstrate that the proposed approach significantly reduces computational cost while maintaining the robust performance of the trained model in acoustic recognition tasks.
Problem

Research questions and friction points this paper is trying to address.

Reduce CWT computational cost in acoustic recognition
Optimize wavelet kernel length and scalogram hop size
Maintain model performance while lowering complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes wavelet kernel length for efficiency
Adjusts scalogram hop size to reduce cost
Maintains performance while lowering computation
Dang Thoai Phan
Dang Thoai Phan
Software engineer
Audio and Speech Processing
Tuan Anh Huynh
Tuan Anh Huynh
University of Information Technology, VNUHCM - Maharishi International University, Fairfield, Iowa
ChatbotKnowledge BaseLLMAIML
V
Van Tuan Pham
Artificial Intelligence, Yokogawa V otiva Solutions, Ho Chi Minh city, Vietnam
C
Cao Minh Tran
Information Technology, Nguyen Tat Thanh University, Ho Chi Minh city, Vietnam
Van Thuan Mai
Van Thuan Mai
Inha University, South Korea.
Control TechnologyMarine HydrodynamicsMachine Learning & Deep Learning
N
Ngoc Quy Tran
Software Engineering, FPT University Hanoi, Hanoi, Vietnam