Are the Hidden States Hiding Something? Testing the Limits of Factuality-Encoding Capabilities in LLMs

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Large language models (LLMs) suffer from factual hallucinations, and existing synthetic-evaluation benchmarks exhibit poor generalizability due to insufficient factual fidelity. Method: We construct a high-fidelity benchmark of paired true/false factual statements, generated from real-world tabular data and question-answer pairs to yield structured, semantically grounded samples; we further propose the first LLM-dependent true/false data generation pipeline. To probe internal representations, we introduce latent-state probing and cross-model consistency analysis to systematically assess whether LLM hidden states encode factual information. Contribution/Results: Experiments on two open-source LLMs validate several prior findings while revealing a significant drop in factual discrimination capability when evaluating model-generated text—highlighting a critical gap between benchmark performance and real-world behavior. Our work establishes a more realistic, challenging, and reproducible benchmark for factual evaluation, accompanied by a rigorous methodological framework for diagnosing factual grounding in LLMs.

Technology Category

Application Category

📝 Abstract

Factual hallucinations are a major challenge for Large Language Models (LLMs). They undermine reliability and user trust by generating inaccurate or fabricated content. Recent studies suggest that when generating false statements, the internal states of LLMs encode information about truthfulness. However, these studies often rely on synthetic datasets that lack realism, which limits generalization when evaluating the factual accuracy of text generated by the model itself. In this paper, we challenge the findings of previous work by investigating truthfulness encoding capabilities, leading to the generation of a more realistic and challenging dataset. Specifically, we extend previous work by introducing: (1) a strategy for sampling plausible true-false factoid sentences from tabular data and (2) a procedure for generating realistic, LLM-dependent true-false datasets from Question Answering collections. Our analysis of two open-source LLMs reveals that while the findings from previous studies are partially validated, generalization to LLM-generated datasets remains challenging. This study lays the groundwork for future research on factuality in LLMs and offers practical guidelines for more effective evaluation.

Problem

Research questions and friction points this paper is trying to address.

Investigating truthfulness encoding in LLM hidden states

Addressing limitations of synthetic datasets for factuality evaluation

Developing realistic datasets to assess LLM factual accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sampling true-false factoids from tabular data

Generating realistic datasets from QA collections

Analyzing truthfulness encoding in LLM states

🔎 Similar Papers

FacLens: Transferable Probe for Foreseeing Non-Factuality in Large Language Models