CausalMoE: A Billion-Scale Multimodal Foundation Model for Granger Causal Discovery with Pattern-Routed Heterogeneous Experts

📅 2026-06-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing neural Granger causality discovery methods are highly susceptible to distribution shifts and dynamic mechanism switches in real-world time series, often yielding confounded representations and spurious causal relationships. To address this, this work proposes a billion-scale multimodal causal foundation model that innovatively incorporates a modality-routing mixture-of-experts mechanism to disentangle shared dynamics from mechanism-specific ones. The model employs causal-aware self-attention to generate sparse, interpretable causal graphs and leverages alignment with large language and vision models to inject multimodal priors that regularize causal estimation. This approach establishes a new state of the art on fully supervised benchmarks and demonstrates remarkable generalization capabilities in challenging scenarios such as few-shot settings, where conventional methods typically fail.

📝 Abstract

Granger Causal Discovery (GCD) is fundamental for analyzing temporal dependencies in complex systems. However, existing neural GCD methods predominantly rely on a "one-size-fits-all" paradigm, struggling to capture distribution shifts and dynamic regime changes inherent in real-world time series. This often leads to entangled representations and spurious causal graphs. In this paper, we propose CausalMoE, a billion-scale multimodal Granger causal foundation model that explicitly models patch-level heterogeneity. CausalMoE introduces a Pattern-Routed Mixture of Heterogeneous Experts, which dynamically identifies latent temporal patterns and routes patches to specialized domain experts, effectively decoupling regime-specific mechanisms from shared dynamics. To ensure interpretable graph recovery, we design a Causality-Aware Self-Attention mechanism operating across variables, yielding sparse Granger causal graphs via proximal optimization. Furthermore, CausalMoE is the first to integrate LLMs and VLMs to align numerical signals with textual and visual priors, regularizing causal estimation in complex scenarios. Extensive experiments demonstrate that CausalMoE establishes a new state-of-the-art on fully supervised benchmarks, while effectively generalizing to few-shot settings where traditional methods fail.

Problem

Research questions and friction points this paper is trying to address.

Granger Causal Discovery

distribution shift

dynamic regime change

spurious causal graphs

time series

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pattern-Routed Mixture of Experts

Granger Causal Discovery

Causality-Aware Self-Attention