🤖 AI Summary
Multilingual large language models (LLMs) exhibit significantly limited performance on code-mixed language understanding and translation tasks. To address this, we propose CHAI, a novel framework that pioneers using the LLM itself as an intelligent annotator to construct high-quality preference data, coupled with an end-to-end RLAIF (Reinforcement Learning from AI Feedback) paradigm for closed-loop optimization of code-mixing capability. Our key contributions are: (1) the first application of LLM-as-Judge for evaluating and generating fine-grained preference labels on code-mixed translation quality; (2) the construction of the first large-scale, high-quality code-mixed translation preference dataset; and (3) substantial improvements in cross-lingual generalization via multi-stage instruction tuning and RLAIF-based alignment. Experiments demonstrate that CHAI-enhanced models achieve a 25.66% higher human win rate over open-source SOTA baselines on real-world code-mixed translation tasks, advancing cross-lingual inclusivity in multilingual LMs.
📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities across various NLP tasks but struggle with code-mixed (or code-switched) language understanding. For example, prior work benchmarking the performance of multilingual LLMs on code-mixed translation tasks has demonstrated that current state-of-the-art multilingual LLMs are ineffective in dealing with code-mixed languages. However, the question of how to improve the capability of multilingual LLMs to handle code-mixed language has not received any attention to date. In this paper, we tackle this research gap by proposing CHAI, a novel general-purpose framework for improving the ability of multilingual LLMs to handle code-mixed languages. CHAI relies on three novel contributions made in this paper. First, we explore the ability of LLMs to provide accurate annotations for code-mixed translation tasks. Second, we leverage this ability of LLMs as annotators to generate preference data for code-mixed translation tasks at scale, which are then used within a reinforcement learning from AI feedback (RLAIF) procedure to improve LLMs' capability on code-mixed tasks. Third, we conduct a rigorous experimental evaluation across various real-world datasets and settings. Our analysis shows that CHAI-powered LLMs outperform state-of-the-art open-source LLMs by 25.66% (in terms of win rate adjudicated by human annotators) in code-mixed translation tasks. This work represents a first step towards developing more inclusive code-mixed LLMs.