MusicLIME: Explainable Multimodal Music Understanding

📅 2024-09-16
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of interpretability in multimodal music understanding models, this paper proposes MusicLIME—the first model-agnostic explanation framework specifically designed for cross-modal interaction between music audio and lyrics. Unlike existing methods that perform unimodal attribution in isolation, MusicLIME introduces a perturbation-based LIME variant tailored to musical signals, jointly models cross-modal feature coupling, and employs a local-to-global explanation aggregation algorithm to enable fine-grained, joint attribution and generate coherent, multimodal interpretability visualizations. Additionally, it incorporates a multimodal attention consistency verification mechanism to ensure explanation reliability. Extensive evaluation on state-of-the-art music classification and sentiment analysis models demonstrates that MusicLIME’s explanations achieve 87% agreement among domain-expert musicians, significantly enhancing decision trustworthiness and bias detection capability.

Technology Category

Application Category

📝 Abstract
Multimodal models are critical for music understanding tasks, as they capture the complex interplay between audio and lyrics. However, as these models become more prevalent, the need for explainability grows-understanding how these systems make decisions is vital for ensuring fairness, reducing bias, and fostering trust. In this paper, we introduce MusicLIME, a model-agnostic feature importance explanation method designed for multimodal music models. Unlike traditional unimodal methods, which analyze each modality separately without considering the interaction between them, often leading to incomplete or misleading explanations, MusicLIME reveals how audio and lyrical features interact and contribute to predictions, providing a holistic view of the model's decision-making. Additionally, we enhance local explanations by aggregating them into global explanations, giving users a broader perspective of model behavior. Through this work, we contribute to improving the interpretability of multimodal music models, empowering users to make informed choices, and fostering more equitable, fair, and transparent music understanding systems.
Problem

Research questions and friction points this paper is trying to address.

Music Understanding
Multimodal Music Models
Explainable AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

MusicLIME
Multimodal Music Models
Interpretability
🔎 Similar Papers
No similar papers found.
T
Theodoros Sotirou
National Technical University of Athens, Athens, Greece
Vassilis Lyberatos
Vassilis Lyberatos
PhD Student @ National Technical University of Athens
O
Orfeas Menis-Mastromichalakis
National Technical University of Athens, Athens, Greece
G
G. Stamou
National Technical University of Athens, Athens, Greece