Mask2Cause: Causal Discovery via Adjacency Constrained Causal Attention

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

Existing deep learning approaches for causal discovery in time series struggle to model shared system dynamics and are prone to spurious correlations during post-hoc graph extraction. This work proposes Mask2Cause, an end-to-end framework that, for the first time, integrates causal graph learning directly into the forecasting process. By leveraging an inverted variable embedding and a mask-based attention mechanism with adjacency constraints, Mask2Cause recovers the underlying causal structure during forward prediction and jointly models causal effects in both the mean and variance components. The method substantially reduces model complexity, achieving state-of-the-art performance in causal discovery across multiple benchmarks while cutting the number of parameters in the predictive model by over 70% without compromising forecasting accuracy.

📝 Abstract

Leveraging deep learning for causal discovery in time series remains challenging because existing neural methods predominantly rely on component-wise architectures that fail to capture shared system dynamics or employ decoupled post-hoc graph extraction that risks overfitting to spurious correlations. We propose $\textbf{Mask2Cause}$, an end-to-end framework that recovers the underlying causal graph directly during the forecasting forward pass. Our approach introduces an Inverted Variable Embedding and an Adjacency-Constrained Masked Attention mechanism, trained with homoscedastic or heteroscedastic objectives to capture causal influences in both mean and variance. Empirical results on diverse benchmarks, from synthetic chaotic dynamics to realistic biological simulations, demonstrate state-of-the-art causal discovery with significantly reduced parameter complexity compared to standard baselines. We further show that inferred causal structures can be used to reduce parameter count of forecasting models by more than 70% on average while maintaining predictive accuracy.

Problem

Research questions and friction points this paper is trying to address.

causal discovery

time series

spurious correlations

shared system dynamics

graph extraction

Innovation

Methods, ideas, or system contributions that make the work stand out.

causal discovery

masked attention

adjacency constraint