DuDi: Dual-Signal Distillation with Cross-Lingual Verbalizer

๐Ÿ“… 2026-06-03
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

185K/year
๐Ÿค– AI Summary
This work addresses the significant performance degradation of small language models in low-resource multilingual settings, such as those found in Southeast Asia. To mitigate this issue, the authors propose DuDi, a dual-signal distillation framework that uniquely integrates online sequence-level supervision with off-policy/on-policy token-level signals. DuDi further introduces a cross-lingual verbalizer to refine teacher feedback, thereby enhancing the complementarity and transferability of distilled knowledge. Evaluated on the SEA-HELM benchmark, the method consistently outperforms existing distillation approaches across diverse model architectures, scales, and teacherโ€“student configurations, demonstrating both its effectiveness and broad applicability.
๐Ÿ“ Abstract
Small language models (SLMs) are efficient and scalable, but their multilingual capabilities degrade severely at sub-billion scales, especially for Southeast Asian (SEA) languages. We introduce DuDi, a dual-signal multilingual distillation framework that combines an online sequence-level signal with off-policy and on-policy token-level signals. DuDi further uses a cross-lingual verbalizer to refine teacher feedback and improve teacher-student transferability in multilingual settings. Experiments on SEA-HELM across multiple model families, scales, and teacher-student settings show that DuDi consistently outperforms competitive distillation baselines. Ablations and analyses confirm that sequence-level optimization, token-level supervision, and cross-lingual verbalization provide complementary and transferable learning signals for multilingual SLMs.
Problem

Research questions and friction points this paper is trying to address.

small language models
multilingual capabilities
Southeast Asian languages
model distillation
cross-lingual transfer
Innovation

Methods, ideas, or system contributions that make the work stand out.

dual-signal distillation
cross-lingual verbalizer
multilingual small language models
sequence-level optimization
token-level supervision
๐Ÿ”Ž Similar Papers
No similar papers found.