Listen, Look, and Learn: Learning Without Forgetting through SAM-Audio

πŸ“… 2026-06-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses catastrophic forgetting in audio-visual class-incremental learning by introducing SAM-Audio to this task for the first time. It proposes an audio-guided visual attention mechanism that leverages SAM-Audio’s powerful multimodal dense representations to strengthen cross-modal associations. To effectively preserve knowledge of previously learned classes, the authors design a dual-level knowledge distillation strategy operating at both feature and logit levels. The proposed method significantly outperforms current state-of-the-art approaches across multiple audio-visual class-incremental learning benchmarks, demonstrating the efficacy of the architecture in mitigating catastrophic forgetting and enhancing continual learning capabilities.
πŸ“ Abstract
Class-Incremental Learning (CIL) aims to continuously learn new classes without forgetting previously acquired knowledge. While recent CIL advances have spurred significant interest across various modalities, the audio-visual setting remains underexplored. Furthermore, although foundational multimodal models like SAM-Audio encapsulate rich static priors, our empirical analysis reveals that these representations struggle in incremental settings. This work bridges this gap by integrating SAM-Audio's audio-visual priors into the CIL setting. Specifically, we leverage its dense audio and visual representations and employ a novel guided attention strategy where the audio features contextually guide the visual representations. To further mitigate catastrophic forgetting, we introduce dual-level distillation objectives at both the feature and logit levels. Extensive evaluations on audio-visual CIL benchmarks demonstrate that our approach consistently outperforms state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Class-Incremental Learning
Audio-Visual Learning
Catastrophic Forgetting
Multimodal Models
SAM-Audio
Innovation

Methods, ideas, or system contributions that make the work stand out.

Class-Incremental Learning
SAM-Audio
Audio-Visual Learning
Guided Attention
Dual-Level Distillation
πŸ”Ž Similar Papers
No similar papers found.