🤖 AI Summary
This study addresses the challenge of multi-label sound classification in South Asian soundscapes, where natural, human, and cultural sounds exhibit substantial overlap, rendering traditional MFCC-based approaches ineffective. To overcome this limitation, the work proposes a convolutional neural network (CNN) architecture that directly utilizes spectrogram inputs, marking the first application of spectrogram-driven CNNs for multi-label classification in such complex acoustic environments. By abandoning conventional handcrafted MFCC features in favor of raw spectrograms, the proposed method achieves significantly superior performance on both the SAS-KIIT and UrbanSound8K datasets compared to existing techniques. This advancement effectively mitigates the performance bottleneck associated with highly overlapping sound sources, thereby establishing a more robust foundation for real-world audio analysis systems operating in intricate acoustic settings.
📝 Abstract
Environmental sound classification is a field of growing importance for urban monitoring and cultural soundscape analysis, especially within the acoustically rich environments of South Asia. These regions present a unique challenge as multiple natural, human, and cultural sounds often overlap, straining traditional methods that frequently rely on Mel Frequency Cepstral Coefficients (MFCC). This study introduces a novel spectrogram-based methodology with a superior ability to capture these complex auditory patterns. A Convolutional Neural Network (CNN) architecture is implemented to solve a demanding multilabel, multiclass classification problem on the SAS-KIIT dataset. To demonstrate robustness and comparability, the approach is also validated using the renowned UrbanSound8K dataset. The results confirm that the proposed spectrogram-based method significantly outperforms existing MFCC-based techniques, achieving higher classification accuracy across both datasets. This improvement lays the groundwork for more robust and accurate audio classification systems in real-world applications.