On the Robustness of Temporal Factual Knowledge in Language Models

📅 2025-02-03

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This paper identifies a systematic deficiency in large language models (LLMs) regarding the understanding and generalization of time-sensitive facts—statements valid only on specific days, months, or years. Using a Wikidata-derived, cross-granularity (day/month/year) temporal fact benchmark, the authors employ prompt-engineering–driven controlled empirical evaluation to reveal that state-of-the-art models—including Llama-3.1-70B—exhibit significant shortcomings in both temporal fact accuracy and cross-granularity generalization. The core contributions are: (1) the first empirical demonstration that LLMs lack time-granularity generalization capability, exposing a fundamental limitation in their use as dynamic knowledge bases; and (2) the introduction of the first fine-grained temporal robustness evaluation framework, enabling comparable assessment of both pre-trained and instruction-tuned models. Results indicate that current LLMs fall short of the precision required for high-fidelity temporal knowledge services.

Technology Category

Application Category

📝 Abstract

This paper explores the temporal robustness of language models (LMs) in handling factual knowledge. While LMs can often complete simple factual statements, their ability to manage temporal facts (those valid only within specific timeframes) remains uncertain. We design a controlled experiment to test the robustness of temporal factual knowledge inside LMs, which we use to evaluate several pretrained and instruction-tuned models using prompts on popular Wikidata facts, assessing their performance across different temporal granularities (Day, Month, and Year). Our findings indicate that even very large state-of-the-art models, such as Llama-3.1-70B, vastly lack robust knowledge of temporal facts. In addition, they are incapable of generalizing their knowledge from one granularity to another. These results highlight the inherent limitations of using LMs as temporal knowledge bases. The source code and data to reproduce our experiments will be released.

Problem

Research questions and friction points this paper is trying to address.

Language Models

Temporal Facts

Time-specific Information

Innovation

Methods, ideas, or system contributions that make the work stand out.

Time-aware Language Models

Temporal Knowledge Representation

Granularity of Time Understanding

🔎 Similar Papers

Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time