Knowledge-Augmented Vision Language Models for Underwater Bioacoustic Spectrogram Analysis

📅 2025-09-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Spectrogram interpretation in marine mammal vocalization analysis remains heavily reliant on manual annotation, while existing vision-language models (VLMs) lack domain-specific adaptation to bioacoustics. Method: This paper proposes a fine-tuning-free, annotation-free VLM–LLM collaborative framework. It leverages a pre-trained VLM to extract visual features directly from acoustic spectrograms and integrates a large language model (LLM) for semantic interpretation, domain-knowledge infusion, and cross-modal reasoning—enabling autonomous construction and validation of underwater bioacoustic knowledge. Contribution/Results: Experiments demonstrate effective zero-shot identification of vocalization patterns, with significant improvements in both classification accuracy and explanatory fidelity. The framework establishes a novel paradigm for automated, knowledge-enhanced analysis of expert-level acoustic spectrograms, bridging the gap between visual representation learning and bioacoustic domain reasoning.

Technology Category

Application Category

📝 Abstract

Marine mammal vocalization analysis depends on interpreting bioacoustic spectrograms. Vision Language Models (VLMs) are not trained on these domain-specific visualizations. We investigate whether VLMs can extract meaningful patterns from spectrograms visually. Our framework integrates VLM interpretation with LLM-based validation to build domain knowledge. This enables adaptation to acoustic data without manual annotation or model retraining.

Problem

Research questions and friction points this paper is trying to address.

Adapting vision language models to underwater bioacoustic spectrograms

Enabling meaningful pattern extraction without manual annotation

Integrating VLM interpretation with LLM-based validation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge-augmented VLM for spectrogram analysis

LLM-based validation for domain knowledge integration

No manual annotation or model retraining required

🔎 Similar Papers

No similar papers found.

Authors to Follow