Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs

📅 2025-03-04

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

Existing response-level preference-based fact-alignment methods suffer from training noise due to the entanglement of hallucinated and factual content. To address this, we propose Mask-DPO—a fine-grained fact-alignment framework that pioneers sentence-level factual annotation as a masking signal to guide the DPO optimization process, thereby preventing erroneous penalization of factual segments within model responses. Our method integrates mask-weighted loss, topic-aware data augmentation, and a scalable factual evaluation mechanism. On the ANAH benchmark, Mask-DPO boosts the factual accuracy of Llama3.1-8B from 49.19% to 77.53%, surpassing the 70B baseline; on the cross-domain Biography task, FactScore improves from 30.29% to 39.39%, demonstrating significantly enhanced generalization. This work establishes the first sentence-level mask-driven DPO paradigm for fact alignment, introducing a novel framework for fine-grained, trustworthy alignment.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) exhibit hallucinations (i.e., unfaithful or nonsensical information) when serving as AI assistants in various domains. Since hallucinations always come with truthful content in the LLM responses, previous factuality alignment methods that conduct response-level preference learning inevitably introduced noises during training. Therefore, this paper proposes a fine-grained factuality alignment method based on Direct Preference Optimization (DPO), called Mask-DPO. Incorporating sentence-level factuality as mask signals, Mask-DPO only learns from factually correct sentences in the preferred samples and prevents the penalty on factual contents in the not preferred samples, which resolves the ambiguity in the preference learning. Extensive experimental results demonstrate that Mask-DPO can significantly improve the factuality of LLMs responses to questions from both in-domain and out-of-domain datasets, although these questions and their corresponding topics are unseen during training. Only trained on the ANAH train set, the score of Llama3.1-8B-Instruct on the ANAH test set is improved from 49.19% to 77.53%, even surpassing the score of Llama3.1-70B-Instruct (53.44%), while its FactScore on the out-of-domain Biography dataset is also improved from 30.29% to 39.39%. We further study the generalization property of Mask-DPO using different training sample scaling strategies and find that scaling the number of topics in the dataset is more effective than the number of questions. We provide a hypothesis of what factual alignment is doing with LLMs, on the implication of this phenomenon, and conduct proof-of-concept experiments to verify it. We hope the method and the findings pave the way for future research on scaling factuality alignment.

Problem

Research questions and friction points this paper is trying to address.

Addresses hallucinations in LLM responses

Improves factuality alignment using Mask-DPO

Enhances generalization across unseen domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained factuality alignment using Mask-DPO

Sentence-level factuality as mask signals

Improves LLMs factuality across diverse datasets

🔎 Similar Papers

OpenFactCheck: Building, Benchmarking Customized Fact-Checking Systems and Evaluating the Factuality of Claims and LLMs

2024-05-09Citations: 7

FacLens: Transferable Probe for Foreseeing Non-Factuality in Large Language Models

2024-06-08Citations: 0

Authors to Follow