🤖 AI Summary
This work addresses the limitations of existing large language model–based source-side rewriting approaches, which rely on manual prompt tuning tailored to specific machine translation models and thus lack generality and automation. The authors propose RLSR, a novel framework that, for the first time, directly employs translation quality as the reward signal in reinforcement learning to train a source-side rewriting model—eliminating the need for handcrafted or tuned prompts. Evaluated across six translation models and sixteen language pairs, RLSR demonstrates consistent effectiveness: a 4B-parameter RLSR model significantly outperforms both the no-rewriting baseline and same-scale prompt-based rewriting methods, achieving performance comparable to that of rewriting using a 235B-parameter large language model. This establishes RLSR as a general, automated, and efficient mechanism for source-side rewriting.
📝 Abstract
Although directly prompting off-the-shelf Large Language Models (LLMs) to generate meaning-preserving source rewrites can effectively enhance Machine Translation (MT) quality, doing so requires manually tuning prompts for different MT models. In this work, we propose RLSR (Reinforcement Learning for Source Rewriting), a novel RL-based framework for training a source rewriting model without tuning prompts for each MT model. RLSR optimizes the rewriting model by directly using the improvement in downstream translation quality yielded by each rewritten source as the reward. Extensive experiments across six MT models and 16 language pairs demonstrate that our 4B rewriting models trained via RLSR significantly outperform the no-rewriting baseline and existing same-scale prompt-based rewriting baselines, while achieving competitive performance against prompt-based baselines based on the 235B LLM.