Knowledge-Augmented Vision Language Models for Underwater Bioacoustic Spectrogram Analysis

📅 2025-09-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Spectrogram interpretation in marine mammal vocalization analysis remains heavily reliant on manual annotation, while existing vision-language models (VLMs) lack domain-specific adaptation to bioacoustics. Method: This paper proposes a fine-tuning-free, annotation-free VLM–LLM collaborative framework. It leverages a pre-trained VLM to extract visual features directly from acoustic spectrograms and integrates a large language model (LLM) for semantic interpretation, domain-knowledge infusion, and cross-modal reasoning—enabling autonomous construction and validation of underwater bioacoustic knowledge. Contribution/Results: Experiments demonstrate effective zero-shot identification of vocalization patterns, with significant improvements in both classification accuracy and explanatory fidelity. The framework establishes a novel paradigm for automated, knowledge-enhanced analysis of expert-level acoustic spectrograms, bridging the gap between visual representation learning and bioacoustic domain reasoning.

Technology Category

Application Category

📝 Abstract
Marine mammal vocalization analysis depends on interpreting bioacoustic spectrograms. Vision Language Models (VLMs) are not trained on these domain-specific visualizations. We investigate whether VLMs can extract meaningful patterns from spectrograms visually. Our framework integrates VLM interpretation with LLM-based validation to build domain knowledge. This enables adaptation to acoustic data without manual annotation or model retraining.
Problem

Research questions and friction points this paper is trying to address.

Adapting vision language models to underwater bioacoustic spectrograms
Enabling meaningful pattern extraction without manual annotation
Integrating VLM interpretation with LLM-based validation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge-augmented VLM for spectrogram analysis
LLM-based validation for domain knowledge integration
No manual annotation or model retraining required
🔎 Similar Papers
No similar papers found.