Cross Paraphrastic Invariance Learning for Hallucination Detection

📅 2026-06-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of hallucination detection in large language model (LLM) outputs, where existing methods often require extensive annotated data or costly evaluation pipelines. The authors propose CPIL, a novel framework that introduces cross-paraphrase invariance learning to capture representations robust to surface-form variations yet sensitive to grounding in source documents. By constructing positive samples from semantic paraphrases and hard negative samples from same-document but label-inconsistent instances, CPIL employs a two-stage training strategy followed by a lightweight binary classifier for efficient hallucination detection. Requiring only approximately 1% of annotated data, the method achieves state-of-the-art performance on the LLM-AggreFact benchmark across 11 tasks, demonstrating superior F1 scores and significantly improved label efficiency and generalization capability.

📝 Abstract

Large language models (LLMs) frequently generate hallucinations, which are unsupported by a source document. To avoid costly LLM-as-evaluator pipelines and the heavy annotation demands of existing classifiers, we propose CPIL (Cross Paraphrastic Invariance Learning), a two-stage Siamese framework that maximizes the utility of existing labeled data. Concretely, CPIL constructs informative training pairs by: (i) generating paraphrastic views of each document-claim example as positives, and explicitly aligning their representations to enforce invariance to surface form; and (ii) mining same-document, opposite-label pairs as hard negatives to sharpen document-sensitive decision boundaries. Then CPIL conduct a two-stage model training: Stage 1 performs contrastive pretraining to learn a paraphrase-invariant, grounding-aware embedding space; and Stage 2 attaches a lightweight classifier for binary groundedness. On the LLM-AggreFact benchmark (11 tasks), CPIL surpasses strong baselines concerning F1 scores with only ~1% labeled data, showing its prediction superiority and label efficiency.

Problem

Research questions and friction points this paper is trying to address.

hallucination detection

large language models

paraphrastic invariance

label efficiency

groundedness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross Paraphrastic Invariance

Hallucination Detection

Contrastive Pretraining