The dynamics of meaning through time: Assessment of Large Language Models

📅 2025-01-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates large language models’ (LLMs) capacity to comprehend historical semantic evolution of concepts. To address the lack of rigorous evaluation for temporal semantics, we propose the first multidimensional benchmark framework specifically designed for semantic temporal dynamics. Our method introduces a cross-temporal term explanation task, integrating automated metrics—perplexity, generation length, and consistency—with double-blind expert subjective evaluation, augmented by temporally grounded prompt engineering to enhance time sensitivity. Results reveal systematic generational disparities: GPT-4 and Claude exhibit significantly greater long-term semantic stability than Llama and early ChatGPT. Critically, all models manifest a pronounced “presentism” bias, with average accuracy dropping by 37% on pre-industrial terminology. These findings expose structural limitations in LLMs’ historical contextualization capabilities and establish a reproducible benchmark and methodological paradigm for temporally aware language understanding.

Technology Category

Application Category

📝 Abstract
Understanding how large language models (LLMs) grasp the historical context of concepts and their semantic evolution is essential in advancing artificial intelligence and linguistic studies. This study aims to evaluate the capabilities of various LLMs in capturing temporal dynamics of meaning, specifically how they interpret terms across different time periods. We analyze a diverse set of terms from multiple domains, using tailored prompts and measuring responses through both objective metrics (e.g., perplexity and word count) and subjective human expert evaluations. Our comparative analysis includes prominent models like ChatGPT, GPT-4, Claude, Bard, Gemini, and Llama. Findings reveal marked differences in each model's handling of historical context and semantic shifts, highlighting both strengths and limitations in temporal semantic understanding. These insights offer a foundation for refining LLMs to better address the evolving nature of language, with implications for historical text analysis, AI design, and applications in digital humanities.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Historical Concept Understanding
Semantic Change Over Time
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Semantic Change
Historical Context Understanding
🔎 Similar Papers
No similar papers found.