Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing generative pretraining algorithms are constrained by two dominant paradigms—autoregressive (discrete) and diffusion-based (continuous)—which struggle to jointly achieve strong multimodal modeling capability and efficient inference, leading to scalability bottlenecks. To address this, we propose a novel “inference-first” paradigm, prioritizing scalability in sequence length and refinement steps. We introduce Inductive Moment Matching (IMM) into generative pretraining for the first time, enabling a single-stage, inherently stable, and multi-step-sampling-free generation algorithm. Our method unifies IMM theory, improved diffusion mechanisms, and inference-time temporal modeling within an end-to-end, multimodal data-driven training framework. Experiments demonstrate that our approach achieves state-of-the-art sample quality comparable to advanced diffusion models, accelerates inference by over 10×, and significantly improves multimodal generation consistency and training stability.

Technology Category

Application Category

📝 Abstract
Recent years have seen significant advancements in foundation models through generative pre-training, yet algorithmic innovation in this space has largely stagnated around autoregressive models for discrete signals and diffusion models for continuous signals. This stagnation creates a bottleneck that prevents us from fully unlocking the potential of rich multi-modal data, which in turn limits the progress on multimodal intelligence. We argue that an inference-first perspective, which prioritizes scaling efficiency during inference time across sequence length and refinement steps, can inspire novel generative pre-training algorithms. Using Inductive Moment Matching (IMM) as a concrete example, we demonstrate how addressing limitations in diffusion models' inference process through targeted modifications yields a stable, single-stage algorithm that achieves superior sample quality with over an order of magnitude greater inference efficiency.
Problem

Research questions and friction points this paper is trying to address.

Addressing stagnation in generative pre-training algorithms.
Improving inference efficiency for multi-modal data processing.
Enhancing sample quality and stability in diffusion models.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Inference-first perspective enhances scaling efficiency
Inductive Moment Matching improves diffusion models
Single-stage algorithm boosts inference efficiency
🔎 Similar Papers
No similar papers found.