Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Weak supervision often fails to provide reliable signals for complex outputs, limiting the generalization and scalability of weak-to-strong model transfer. This work proposes a "weak-critic strong-supervision" paradigm, wherein a weak model acts as a non-misleading critic to guide a stronger model in more effectively leveraging its own knowledge. Through Online Progressive Critic Distillation (OPCD)—integrating weak-critic generation, high-quality critique filtering, an adaptive self-teacher mechanism, and alignment-aware training—the approach embeds high-fidelity criticism directly into the strong model’s optimization process. Experiments demonstrate consistent performance gains across reasoning and alignment benchmarks, offering a viable pathway toward scalable supervision using only weakly labeled data.

📝 Abstract

As large language models become stronger, weak supervisors may fail to provide reliable labels, preferences, or final judgments for complex outputs, limiting both weak-to-strong generalization and scalable oversight. We study a more tractable form of weak supervision: using a weak model as a critic rather than as a labeler or judge. Instead of solving the task or selecting the correct answer, the weak critic only needs to provide a non-misleading revision direction that helps the strong model better use its own knowledge. We call this setting *weak-critic strong oversight*. We first show that weak critiques can improve frozen strong models at inference time, and that critique quality is key to this improvement. We then propose progressive on-policy critique distillation (**OPCD**), which filters high-quality critiques and distills critic-guided behavior into the strong model through adaptive self-teacher signals. Experiments on reasoning and alignment benchmarks show that our method improves strong models over training epochs, suggesting an effective path for scalable oversight with weak supervision.

Problem

Research questions and friction points this paper is trying to address.

scalable oversight

weak supervision

large language models

critique distillation

weak-to-strong generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

weak-critic supervision

critique distillation

scalable oversight