🤖 AI Summary
This work addresses the underexplored problem of editing abstract auditory attribute knowledge in Large Audio–Language Models (LALMs). Existing knowledge editing methods focus predominantly on text or vision modalities and neglect audio-specific characteristics. To bridge this gap, we propose the first auditory-attribute-oriented knowledge editing framework and introduce SAKE—the first dedicated benchmark for evaluating such editing. SAKE targets updates of abstract auditory attributes and defines a four-dimensional evaluation protocol: fidelity, generalization, transferability, and audio–text locality. We systematically evaluate seven state-of-the-art editing methods across two LALMs, revealing critical deficiencies in maintaining attribute consistency, enabling cross-modal reasoning generalization, and ensuring stability under sequential edits. This work establishes the first foundation for auditory knowledge editing, providing a benchmark, methodology, and clear identification of key challenges for extending multimodal knowledge editing into the audio domain.
📝 Abstract
Knowledge editing offers an efficient way to update model knowledge without full retraining, but prior work has concentrated almost exclusively on textual or visual modalities. We introduce SAKE, the first benchmark specifically designed for editing auditory attribute knowledge in Large Audio-Language Models (LALMs). Unlike factual updates, SAKE targets several abstract auditory attributes, capturing knowledge types that go beyond conventional textual and visual domains. We benchmark seven editing methods on two LALMs along four dimensions: reliability, generality, audio/text locality, and portability. Results highlight challenges such as preserving intra-attribute knowledge unrelated to the edit, generalizing edits to multimodal reasoning, and maintaining edits under sequential updates. SAKE provides a principled framework to study how knowledge editing extends to the auditory modalities, opening new directions for maintaining and adapting LALMs in more diverse real-world scenarios.