Actionable Interpretability via Causal Hypergraphs: Unravelling Batch Size Effects in Deep Learning

📅 2025-06-21

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This study investigates the causal mechanisms by which batch size affects generalization in deep learning models trained on graph and text data. We propose HGCNet, the first framework to model higher-order interactions among training dynamics variables via causal hypergraphs, integrating deep structural causal models (DSCMs) with do-calculus to quantify both direct and mediated effects of batch size on gradient noise, minimum flatness, and model complexity. The approach delivers actionable causal interpretability, yielding theoretically grounded guidance for architecture design and optimization strategies. Evaluated on citation networks, biomedical text, and e-commerce review datasets, HGCNet significantly outperforms strong baselines—including GCN, GAT, PI-GNN, BERT, and RoBERTa—demonstrating that small batches improve generalization by enhancing gradient stochasticity and promoting convergence to flatter minima.

Technology Category

Application Category

📝 Abstract

While the impact of batch size on generalisation is well studied in vision tasks, its causal mechanisms remain underexplored in graph and text domains. We introduce a hypergraph-based causal framework, HGCNet, that leverages deep structural causal models (DSCMs) to uncover how batch size influences generalisation via gradient noise, minima sharpness, and model complexity. Unlike prior approaches based on static pairwise dependencies, HGCNet employs hypergraphs to capture higher-order interactions across training dynamics. Using do-calculus, we quantify direct and mediated effects of batch size interventions, providing interpretable, causally grounded insights into optimisation. Experiments on citation networks, biomedical text, and e-commerce reviews show that HGCNet outperforms strong baselines including GCN, GAT, PI-GNN, BERT, and RoBERTa. Our analysis reveals that smaller batch sizes causally enhance generalisation through increased stochasticity and flatter minima, offering actionable interpretability to guide training strategies in deep learning. This work positions interpretability as a driver of principled architectural and optimisation choices beyond post hoc analysis.

Problem

Research questions and friction points this paper is trying to address.

Explores causal mechanisms of batch size in graph and text domains

Introduces HGCNet to analyze batch size effects via hypergraphs

Quantifies batch size impact on generalization and optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hypergraph-based causal framework HGCNet

Deep structural causal models DSCMs

Do-calculus for batch size effects

🔎 Similar Papers

Causal Concept Graph Models: Beyond Causal Opacity in Deep Learning