Model Steering: Learning with a Reference Model Improves Generalization Bounds and Scaling Laws

📅 2025-05-10

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the lack of theoretical foundations and limited generalization in model-guided training. We propose a theory-driven paradigm based on Distributionally Robust Optimization (DRO). First, we introduce the DRRho risk minimization framework—its theoretical analysis reveals how reference models enhance target-model generalization via data reweighting or filtering, and yields improved scaling laws. Second, we develop DRRho-CLIP, a novel pretraining method that integrates DRO with contrastive learning to enable reference-model-guided robust multimodal representation learning. We theoretically prove that our framework substantially tightens the generalization bound. Empirically, DRRho-CLIP outperforms standard CLIP and heuristic model-guided approaches across diverse downstream tasks, demonstrating superior generalization, data efficiency, and out-of-distribution robustness.

Technology Category

Application Category

📝 Abstract

This paper formalizes an emerging learning paradigm that uses a trained model as a reference to guide and enhance the training of a target model through strategic data selection or weighting, named $ extbf{model steering}$. While ad-hoc methods have been used in various contexts, including the training of large foundation models, its underlying principles remain insufficiently understood, leading to sub-optimal performance. In this work, we propose a theory-driven framework for model steering called $ extbf{DRRho risk minimization}$, which is rooted in Distributionally Robust Optimization (DRO). Through a generalization analysis, we provide theoretical insights into why this approach improves generalization and data efficiency compared to training without a reference model. To the best of our knowledge, this is the first time such theoretical insights are provided for the new learning paradigm, which significantly enhance our understanding and practice of model steering. Building on these insights and the connection between contrastive learning and DRO, we introduce a novel method for Contrastive Language-Image Pretraining (CLIP) with a reference model, termed DRRho-CLIP. Extensive experiments validate the theoretical insights, reveal a superior scaling law compared to CLIP without a reference model, and demonstrate its strength over existing heuristic approaches.

Problem

Research questions and friction points this paper is trying to address.

Theoretical framework for model steering with reference models

Improving generalization and data efficiency in training

Novel method for contrastive learning with reference models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model steering enhances training via reference model

DRRho risk minimization framework improves generalization

DRRho-CLIP method optimizes contrastive learning efficiency

🔎 Similar Papers

Rethinking Conventional Wisdom in Machine Learning: From Generalization to Scaling