🤖 AI Summary
Large language models (LLMs) suffer from insufficient context faithfulness in information retrieval and rely heavily on costly, human-annotated data. Method: This paper proposes CANOE, a zero-annotation framework comprising: (1) synthetic short/long-text training data generated from four verifiable question-answering tasks; (2) Dual-GRPO, a dual-objective reinforcement learning algorithm jointly optimizing faithfulness for both short answers and long-form generation while mitigating overfitting and annotation bias; and (3) a multi-granularity unsupervised faithfulness reward coupled with context-alignment mechanisms. Contribution/Results: CANOE is the first framework to unify faithfulness alignment across short and long generations within a single architecture. It significantly outperforms strong baselines—including GPT-4o and OpenAI o1—across 11 downstream retrieval and generation tasks, empirically validating the effectiveness and generalizability of the zero-annotation paradigm.
📝 Abstract
Teaching large language models (LLMs) to be faithful in the provided context is crucial for building reliable information-seeking systems. Therefore, we propose a systematic framework, CANOE, to improve the faithfulness of LLMs in both short-form and long-form generation tasks without human annotations. Specifically, we first synthesize short-form question-answering (QA) data with four diverse tasks to construct high-quality and easily verifiable training data without human annotation. Also, we propose Dual-GRPO, a rule-based reinforcement learning method that includes three tailored rule-based rewards derived from synthesized short-form QA data, while simultaneously optimizing both short-form and long-form response generation. Notably, Dual-GRPO eliminates the need to manually label preference data to train reward models and avoids over-optimizing short-form generation when relying only on the synthesized short-form QA data. Experimental results show that CANOE greatly improves the faithfulness of LLMs across 11 different downstream tasks, even outperforming the most advanced LLMs, e.g., GPT-4o and OpenAI o1.