๐ค AI Summary
This work addresses the significant performance degradation of small language models in low-resource multilingual settings, such as those found in Southeast Asia. To mitigate this issue, the authors propose DuDi, a dual-signal distillation framework that uniquely integrates online sequence-level supervision with off-policy/on-policy token-level signals. DuDi further introduces a cross-lingual verbalizer to refine teacher feedback, thereby enhancing the complementarity and transferability of distilled knowledge. Evaluated on the SEA-HELM benchmark, the method consistently outperforms existing distillation approaches across diverse model architectures, scales, and teacherโstudent configurations, demonstrating both its effectiveness and broad applicability.
๐ Abstract
Small language models (SLMs) are efficient and scalable, but their multilingual capabilities degrade severely at sub-billion scales, especially for Southeast Asian (SEA) languages. We introduce DuDi, a dual-signal multilingual distillation framework that combines an online sequence-level signal with off-policy and on-policy token-level signals. DuDi further uses a cross-lingual verbalizer to refine teacher feedback and improve teacher-student transferability in multilingual settings. Experiments on SEA-HELM across multiple model families, scales, and teacher-student settings show that DuDi consistently outperforms competitive distillation baselines. Ablations and analyses confirm that sequence-level optimization, token-level supervision, and cross-lingual verbalization provide complementary and transferable learning signals for multilingual SLMs.