LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence

📅 2025-09-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing structured-data modeling approaches—namely, their reliance on task-specific architectures and fine-tuning—by proposing LimiX, a foundation model for general-purpose intelligence over structured data. LimiX models structured data as the joint distribution of variables and missingness patterns, enabling unified conditional prediction across diverse tasks—including classification, regression, imputation, and synthetic data generation—via query-based inference. Its key innovations are (i) masked joint-distribution pretraining and (ii) a context-aware conditional prediction mechanism, which together support training-free adaptation and rapid cross-dataset generalization. Evaluated on ten large-scale benchmarks, LimiX consistently outperforms gradient-boosted trees, deep tabular models, state-of-the-art tabular foundation models, and AutoML systems. It is the first model to achieve universal, zero-shot, multi-task inference over structured data with a single architecture and no task-specific fine-tuning.

Technology Category

Application Category

📝 Abstract
We argue that progress toward general intelligence requires complementary foundation models grounded in language, the physical world, and structured data. This report presents LimiX, the first installment of our large structured-data models (LDMs). LimiX treats structured data as a joint distribution over variables and missingness, thus capable of addressing a wide range of tabular tasks through query-based conditional prediction via a single model. LimiX is pretrained using masked joint-distribution modeling with an episodic, context-conditional objective, where the model predicts for query subsets conditioned on dataset-specific contexts, supporting rapid, training-free adaptation at inference. We evaluate LimiX across 10 large structured-data benchmarks with broad regimes of sample size, feature dimensionality, class number, categorical-to-numerical feature ratio, missingness, and sample-to-feature ratios. With a single model and a unified interface, LimiX consistently surpasses strong baselines including gradient-boosting trees, deep tabular networks, recent tabular foundation models, and automated ensembles, as shown in Figure 1 and Figure 2. The superiority holds across a wide range of tasks, such as classification, regression, missing value imputation, and data generation, often by substantial margins, while avoiding task-specific architectures or bespoke training per task. All LimiX models are publicly accessible under Apache 2.0.
Problem

Research questions and friction points this paper is trying to address.

Modeling structured data as joint distribution for tabular tasks
Enabling training-free adaptation via query-based conditional prediction
Surpassing specialized models across diverse data benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint distribution modeling for structured data
Masked pretraining with context-conditional objective
Unified model for multiple tabular tasks
🔎 Similar Papers
No similar papers found.
Xingxuan Zhang
Xingxuan Zhang
Postdoctoral Research Scientist at Department of Computer Science, Tsinghua University
computer visionOOD GeneralizationDomain GeneralizationOptimization
Gang Ren
Gang Ren
Stable AI & Tsinghua University
H
Han Yu
Stable AI & Tsinghua University
Hao Yuan
Hao Yuan
Research Scientist, Meta Platforms, Inc.
Deep Learning
H
Hui Wang
Stable AI & Tsinghua University
J
Jiansheng Li
Stable AI & Tsinghua University
Jiayun Wu
Jiayun Wu
Carnegie Mellon University
Machine Learning
L
Lang Mo
Stable AI & Tsinghua University
Li Mao
Li Mao
Stable AI & Tsinghua University
M
Mingchao Hao
Stable AI & Tsinghua University
N
Ningbo Dai
Stable AI & Tsinghua University
Renzhe Xu
Renzhe Xu
Assistant Professor of Computer Science, Shanghai University of Finance and Economics
Algorithmic Game TheorySequential Decision Making
S
Shuyang Li
Stable AI & Tsinghua University
T
Tianyang Zhang
Stable AI & Tsinghua University
Yue He
Yue He
Tsinghua University
causal inference
Y
Yuanrui Wang
Stable AI & Tsinghua University
Yunjia Zhang
Yunjia Zhang
Stable AI & Tsinghua University
Z
Zijing Xu
Stable AI & Tsinghua University
D
Dongzhe Li
Stable AI & Tsinghua University
F
Fang Gao
Stable AI & Tsinghua University
H
Hao Zou
Stable AI & Tsinghua University
J
Jiandong Liu
Stable AI & Tsinghua University
Jiashuo Liu
Jiashuo Liu
Tsinghua University
Robust OptimizationOOD GeneralizationData-Centric AI
J
Jiawei Xu
Stable AI & Tsinghua University
K
Kaijie Cheng
Stable AI & Tsinghua University