CausalMoE: A Billion-Scale Multimodal Foundation Model for Granger Causal Discovery with Pattern-Routed Heterogeneous Experts

📅 2026-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing neural Granger causality discovery methods are highly susceptible to distribution shifts and dynamic mechanism switches in real-world time series, often yielding confounded representations and spurious causal relationships. To address this, this work proposes a billion-scale multimodal causal foundation model that innovatively incorporates a modality-routing mixture-of-experts mechanism to disentangle shared dynamics from mechanism-specific ones. The model employs causal-aware self-attention to generate sparse, interpretable causal graphs and leverages alignment with large language and vision models to inject multimodal priors that regularize causal estimation. This approach establishes a new state of the art on fully supervised benchmarks and demonstrates remarkable generalization capabilities in challenging scenarios such as few-shot settings, where conventional methods typically fail.
📝 Abstract
Granger Causal Discovery (GCD) is fundamental for analyzing temporal dependencies in complex systems. However, existing neural GCD methods predominantly rely on a "one-size-fits-all" paradigm, struggling to capture distribution shifts and dynamic regime changes inherent in real-world time series. This often leads to entangled representations and spurious causal graphs. In this paper, we propose CausalMoE, a billion-scale multimodal Granger causal foundation model that explicitly models patch-level heterogeneity. CausalMoE introduces a Pattern-Routed Mixture of Heterogeneous Experts, which dynamically identifies latent temporal patterns and routes patches to specialized domain experts, effectively decoupling regime-specific mechanisms from shared dynamics. To ensure interpretable graph recovery, we design a Causality-Aware Self-Attention mechanism operating across variables, yielding sparse Granger causal graphs via proximal optimization. Furthermore, CausalMoE is the first to integrate LLMs and VLMs to align numerical signals with textual and visual priors, regularizing causal estimation in complex scenarios. Extensive experiments demonstrate that CausalMoE establishes a new state-of-the-art on fully supervised benchmarks, while effectively generalizing to few-shot settings where traditional methods fail.
Problem

Research questions and friction points this paper is trying to address.

Granger Causal Discovery
distribution shift
dynamic regime change
spurious causal graphs
time series
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pattern-Routed Mixture of Experts
Granger Causal Discovery
Causality-Aware Self-Attention
Multimodal Foundation Model
Heterogeneous Time Series
🔎 Similar Papers
B
Bo Liu
State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University, Beijing, China
D
Di Dai
State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University, Beijing, China
Jingwei Liu
Jingwei Liu
Carnegie Mellon University; Chinese Academy of Sciences; Tsinghua University
MEMSBioMEMS3D-ICCMOSSemiconductor
Jiarui Jin
Jiarui Jin
Xiaohongshu; Shanghai Jiao Tong University; University College London
Multimodal MiningRecommender SystemInformation RetrievalLarge Language Model
X
Xiaocheng Fang
State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University, Beijing, China
G
Guangkun Nie
State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University, Beijing, China
H
Hongyan Li
State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University, Beijing, China
Shenda Hong
Shenda Hong
Assistant Professor, Peking University
AI ECGBiosignalAI for Digital HealthHealth Data ScienceAI for Healthcare