What Makes a Strong Model? A Unified Spectral Analysis of Knowledge Transfer over High-dimensional Linear Regression

📅 2026-05-31
📈 Citations: 0
Influential: 0
📄 PDF

career value

193K/year
🤖 AI Summary
This work addresses the lack of a unified theoretical understanding of existing knowledge transfer mechanisms—such as knowledge distillation and weak-to-strong generalization—by proposing a cohesive spectral framework within high-dimensional linear regression. By analyzing the dynamics of stochastic gradient descent (SGD), the study introduces two key mechanisms: “spectral horizon expansion” and “spectral denoising.” These mechanisms jointly elucidate how diverse transfer paradigms leverage implicit regularization and heterogeneous spectral learning rates to enhance the model’s capacity to learn high-frequency signals while suppressing optimization-induced noise. The paper establishes the first unified spectral analysis framework for knowledge transfer, revealing common principles underlying the effectiveness of various transfer strategies.
📝 Abstract
Teacher-Student Knowledge Transfer (KT) is ubiquitous in modern machine learning, ranging from classical model compression via Knowledge Distillation (KD) to the emergent phenomenon of Weak-to-Strong (W2S) generalization. While existing studies offer isolated insights, a unified theoretical framework explaining the efficacy of KT across these disparate regimes remains lacking. In this work, we establish a unified spectral analysis of SGD dynamics in high-dimensional linear regression, elucidating the efficiency of KT across seemingly disparate regimes. We characterize KT efficiency through two distinct mechanisms: \emph{Spectral Horizon Expansion} in KD, which enables the capture of statistically inaccessible high-frequency signals, and \emph{Spectral Denoising} in W2S, where the student acts as a filter for optimization noise. Our framework unifies these phenomena, revealing that the efficacy of transfer is governed by the interplay between implicit regularization and heterogeneous spectral learning speeds over the spectrum.
Problem

Research questions and friction points this paper is trying to address.

Knowledge Transfer
Unified Framework
Spectral Analysis
High-dimensional Linear Regression
Theoretical Understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral Horizon Expansion
Spectral Denoising
Knowledge Transfer
High-dimensional Linear Regression
Implicit Regularization
W
Wendao Wu
State Key Lab of General AI, School of Intelligence Science and Technology, Peking University, China; School of Mathematical Sciences, Peking University, China
F
Fangqing Zhang
State Key Lab of General AI, School of Intelligence Science and Technology, Peking University, China
H
Haihan Zhang
State Key Lab of General AI, School of Intelligence Science and Technology, Peking University, China
Cong Fang
Cong Fang
Peking University
machine learningoptmizationstatistics