Parameter-efficient Dual-encoder Architecture with Differentiable Choquet Integral Fusion for Underwater Acoustic Classification

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

This work addresses the challenge that single signal representations struggle to simultaneously achieve noise robustness and discriminative power in complex underwater acoustic environments. To overcome this limitation, the authors propose a dual-encoder architecture that processes raw waveforms and time-frequency spectrograms in parallel, leveraging pretrained backbones combined with parameter-efficient fine-tuning (PEFT). A differentiable Choquet integral is introduced to enable dynamic, fuzzy fusion of time–frequency representations, where class-specific fuzzy measures are learned to enhance classification accuracy. This approach not only improves performance but also offers interpretability through representation-dependent interactions and mitigates non-stationary interference caused by asymmetric underwater channel distortions via dynamic gating. Experiments on the DeepShip and ShipsEar datasets demonstrate that the proposed method significantly outperforms single-encoder baselines while reducing trainable parameters, thereby lowering both computational cost and overfitting risk.

📝 Abstract

Underwater acoustic classification has a wide array of oceanic applications, but faces challenges due to an increasingly complex acoustic environment. Waveform and spectrogram representations have been primarily used as acoustic data features for classification tasks in this domain. Spectrograms model harmonic dependencies, but these reduced representations can filter out acoustic features relevant for discrimination. While phase information from the waveform allows full characterization of the signal, the original waveform can be noisy and complex, rendering this representation difficult for models to process directly. This paper proposes a dual-encoder neural architecture to simultaneously process acoustic waveforms and spectrograms, leveraging pre-trained backbones and parameter-efficient fine-tuning modules, enabling a domain adaptation. To combine these adapted branches, a novel differentiable fuzzy aggregation mechanism based on the Choquet integral is introduced to balance the temporal and spectral representations. This fusion strategy not only yields higher classification accuracy but also provides interpretability. Specifically, by analyzing the learned fuzzy measures, insights are revealed about class-specific shifts in the network's representation reliance. By dynamically shifting attention to the representation least corrupted by potential asymmetric channel distortions, the proposed gating mechanism mitigates the non-stationary challenges of the underwater environment. Evaluations on the DeepShip and ShipsEar datasets demonstrate that the proposed architecture achieves classification improvements over independent single-encoder baselines, while simultaneously restricting the trainable parameter space. This mitigates the risk of overfitting on limited acoustic datasets while alleviating the computational costs associated with fully fine-tuning foundation models.

Problem

Research questions and friction points this paper is trying to address.

underwater acoustic classification

waveform representation

spectrogram representation

non-stationary environment

feature fusion

Innovation

Methods, ideas, or system contributions that make the work stand out.

dual-encoder architecture

Choquet integral fusion

parameter-efficient fine-tuning