Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher

📅 2026-05-31
📈 Citations: 0
Influential: 0
📄 PDF

career value

184K/year
🤖 AI Summary
This work addresses weak-to-strong generalization under label scarcity by framing it as a data selection problem and proposes a trust-based weak supervision filtering mechanism. Specifically, each weakly labeled example is assigned a scalar trust score, and only high-confidence samples are selected to train a stronger student model. The approach naturally supports iterative teacher-student chain training, progressively amplifying performance gains across generations. Empirical results demonstrate that the student models achieve performance on par with or even surpassing fully supervised baselines across diverse tasks—including world knowledge, quantitative reasoning, and strategic gameplay—thereby enabling near-lossless weak-to-strong generalization.
📝 Abstract
Weak-to-strong generalization studies how to improve a strong student using supervision from a weaker teacher when reliable labels are scarce. We view this primarily as a data selection problem, where the key challenge is to identify which weak labels are reliable enough to serve as a training signal. To address this, we introduce trust functions that assign each weak label a scalar trust score and use these scores to filter weak supervision. Across several domains, including world knowledge, quantitative reasoning, and strategy games, trust filtering yields students that match and sometimes surpass ground-truth supervision, achieving near-lossless weak-to-strong generalization. Moreover, trust functions enable an iterative weak-to-strong chain that compounds gains by training a student and reusing it as the next teacher, amplifying the gains. There are several mechanisms to which advantage of trust functions can be attributed.
Problem

Research questions and friction points this paper is trying to address.

weak-to-strong generalization
data selection
trust functions
weak supervision
reliable labels
Innovation

Methods, ideas, or system contributions that make the work stand out.

Trust Functions
Weak-to-Strong Generalization
Data Selection
Trust Scoring
Iterative Knowledge Distillation
🔎 Similar Papers
No similar papers found.