InfoAtlas: A Foundation Model for Zero-Shot Statistical Dependence Estimate

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the computational inefficiency of conventional neural mutual information estimators, which require time-consuming optimization for each individual dataset and thus struggle to meet real-time demands. The authors propose InfoAtlas, the first framework to formulate mutual information estimation as a zero-shot inference task. By pretraining a foundation model on a large-scale synthetic dataset encompassing diverse dependency structures, InfoAtlas enables direct prediction of mutual information for arbitrary input dimensions and sample sizes via a single forward pass. This approach breaks away from the prevailing paradigm of per-dataset optimization, achieving over two orders of magnitude speedup while maintaining estimation accuracy comparable to state-of-the-art methods. Moreover, InfoAtlas demonstrates strong generalization to complex real-world scenarios and varying data scales without additional retraining.

📝 Abstract

Measuring statistical dependency between high-dimensional random variables is a fundamental task in data science and machine learning. Neural mutual information (MI) estimators offer a promising avenue, but they typically require costly iterative optimization for each new dataset, making them impractical for real-time applications. We present InfoAtlas, a foundation model-like architecture that eliminates this bottleneck by directly inferring MI in a single forward pass. Pretrained on large-scale synthetic data with rich dependence patterns, InfoAtlas learns to identify diverse dependence structures and predict MI directly from the dataset. Comprehensive experiments demonstrate that InfoAtlas matches state-of-the-art neural estimators in accuracy while achieving $100\times$ speedup, can flexibly handle varying dimensions and sample sizes through a single unified model, and generalizes effectively to complex, real-world scenarios. By reformulating MI estimation as an inference task, InfoAtlas establishes a foundation for real-time dependency analysis.

Problem

Research questions and friction points this paper is trying to address.

statistical dependence

mutual information estimation

high-dimensional data

real-time application

zero-shot

Innovation

Methods, ideas, or system contributions that make the work stand out.

foundation model

mutual information estimation

zero-shot inference