Soundscapes in Spectrograms: Pioneering Multilabel Classification for South Asian Sounds

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of multi-label sound classification in South Asian soundscapes, where natural, human, and cultural sounds exhibit substantial overlap, rendering traditional MFCC-based approaches ineffective. To overcome this limitation, the work proposes a convolutional neural network (CNN) architecture that directly utilizes spectrogram inputs, marking the first application of spectrogram-driven CNNs for multi-label classification in such complex acoustic environments. By abandoning conventional handcrafted MFCC features in favor of raw spectrograms, the proposed method achieves significantly superior performance on both the SAS-KIIT and UrbanSound8K datasets compared to existing techniques. This advancement effectively mitigates the performance bottleneck associated with highly overlapping sound sources, thereby establishing a more robust foundation for real-world audio analysis systems operating in intricate acoustic settings.

Technology Category

Application Category

📝 Abstract
Environmental sound classification is a field of growing importance for urban monitoring and cultural soundscape analysis, especially within the acoustically rich environments of South Asia. These regions present a unique challenge as multiple natural, human, and cultural sounds often overlap, straining traditional methods that frequently rely on Mel Frequency Cepstral Coefficients (MFCC). This study introduces a novel spectrogram-based methodology with a superior ability to capture these complex auditory patterns. A Convolutional Neural Network (CNN) architecture is implemented to solve a demanding multilabel, multiclass classification problem on the SAS-KIIT dataset. To demonstrate robustness and comparability, the approach is also validated using the renowned UrbanSound8K dataset. The results confirm that the proposed spectrogram-based method significantly outperforms existing MFCC-based techniques, achieving higher classification accuracy across both datasets. This improvement lays the groundwork for more robust and accurate audio classification systems in real-world applications.
Problem

Research questions and friction points this paper is trying to address.

environmental sound classification
multilabel classification
South Asian soundscapes
acoustic overlap
soundscape analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

spectrogram-based classification
multilabel sound classification
Convolutional Neural Network
South Asian soundscapes
environmental sound analysis
🔎 Similar Papers
No similar papers found.
S
Sudip Chakrabarty
School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, India
P
Pappu Bishwas
School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, India
Rajdeep Chatterjee
Rajdeep Chatterjee
Professor of Physics, IIT Roorkee
Theoretical Nuclear physics
Tathagata Bandyopadhyay
Tathagata Bandyopadhyay
MSc Informatics, Technical University of Munich
3D Deep LearningComputer VisionAudio and Speech processingLarge Language ModelsBCI
D
Digonto Biswas
School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, India
B
Bibek Howlader
School of Computer Engineering, American International University, Dhaka, Bangladesh