DuDi: Dual-Signal Distillation with Cross-Lingual Verbalizer

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the significant performance degradation of small language models in low-resource multilingual settings, such as those found in Southeast Asia. To mitigate this issue, the authors propose DuDi, a dual-signal distillation framework that uniquely integrates online sequence-level supervision with off-policy/on-policy token-level signals. DuDi further introduces a cross-lingual verbalizer to refine teacher feedback, thereby enhancing the complementarity and transferability of distilled knowledge. Evaluated on the SEA-HELM benchmark, the method consistently outperforms existing distillation approaches across diverse model architectures, scales, and teacher–student configurations, demonstrating both its effectiveness and broad applicability.

📝 Abstract

Small language models (SLMs) are efficient and scalable, but their multilingual capabilities degrade severely at sub-billion scales, especially for Southeast Asian (SEA) languages. We introduce DuDi, a dual-signal multilingual distillation framework that combines an online sequence-level signal with off-policy and on-policy token-level signals. DuDi further uses a cross-lingual verbalizer to refine teacher feedback and improve teacher-student transferability in multilingual settings. Experiments on SEA-HELM across multiple model families, scales, and teacher-student settings show that DuDi consistently outperforms competitive distillation baselines. Ablations and analyses confirm that sequence-level optimization, token-level supervision, and cross-lingual verbalization provide complementary and transferable learning signals for multilingual SLMs.

Problem

Research questions and friction points this paper is trying to address.

small language models

multilingual capabilities

Southeast Asian languages

model distillation

cross-lingual transfer

Innovation

Methods, ideas, or system contributions that make the work stand out.

dual-signal distillation

cross-lingual verbalizer

multilingual small language models