Identifiable Multimodal Causal Representation Learning under Partial Latent Sharing

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses the challenge of non-identifiability in multimodal causal representation learning, where partially shared latent structures often undermine interpretability and robustness. The authors establish, for the first time, component-wise identifiability guarantees for causal latent representations under nonlinear, underdetermined, and partially shared multimodal settings—without requiring parametric distributional assumptions on latent variables. Their approach leverages a differentiable module based on the Wasserstein distance to efficiently recover the shared structure, seamlessly integrating with mainstream neural network architectures. Empirical evaluations demonstrate consistent and significant improvements over state-of-the-art methods on both synthetic and real-world datasets, achieving enhanced identifiability and superior performance on downstream tasks.

📝 Abstract

Causal representation learning (CRL) seeks to uncover meaningful latent variables and their corresponding causal structure from high-dimensional observational data. Although its significance, CRL identifiability remains a crucial property, as it ensures the recovery of the mechanisms behind the data generation process, and hence the interpretability and robustness of the representation. Proving identifiability in CRL is intrinsically difficult, and we address in this work an even more challenging setting: multimodality. We consider multimodal observed data with a latent partially shared structure. Each modality is generated, through non linear mixing functions, from a specific subset of causal latent variables. Under flexible assumptions and without imposing any parametric distribution on the latent variables, we establish component-wise identifiability guarantees for the causal latent representation. Our identifiability results, furthermore, apply to the undercomplete scenario where we have, for each modality, more observed than latent variables. To instantiate our theoretical analysis, we introduce a Wasserstein-based module to recover the partially shared latent structure. Due to its differentiability, the latter can be easily integrated into all types of architecture, only requiring minimal changes. Extensive experiments on synthetic and realistic datasets validate the superiority of our approach over SOTA methods.

Problem

Research questions and friction points this paper is trying to address.

causal representation learning

identifiability

multimodal data

latent variable

partial latent sharing

Innovation

Methods, ideas, or system contributions that make the work stand out.

identifiable representation learning

multimodal causal representation

partial latent sharing