Automated Annotation of Evolving Corpora for Augmenting Longitudinal Network Data: A Framework Integrating Large Language Models and Expert Knowledge

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the challenges of delayed and inconsistent manual annotation caused by semantic evolution in longitudinal network analysis, this paper proposes Expert-Augmented Language Model Annotation (EALA), a knowledge-enhanced automatic annotation framework. EALA integrates a structured semantic codebook, historical annotation-driven prompt engineering, and a consistency verification mechanism to enable interpretable and scalable annotation across evolving interaction types and thematic domains over time. Experiments on a climate negotiation dataset demonstrate that EALA significantly improves annotation timeliness and cross-temporal consistency, achieving state-of-the-art accuracy in interaction-type classification. Moreover, it provides the first systematic empirical analysis revealing critical limitations of large language models (LLMs) in fine-grained semantic drift detection. This work establishes a reliable, reproducible, and automated annotation paradigm for long-term dynamic modeling of political, economic, and social systems.

Technology Category

Application Category

📝 Abstract

Longitudinal network data are essential for analyzing political, economic, and social systems and processes. In political science, these datasets are often generated through human annotation or supervised machine learning applied to evolving corpora. However, as semantic contexts shift over time, inferring dynamic interaction types on emerging issues among a diverse set of entities poses significant challenges, particularly in maintaining timely and consistent annotations. This paper presents the Expert-Augmented LLM Annotation (EALA) approach, which leverages Large Language Models (LLMs) in combination with historically annotated data and expert-constructed codebooks to extrapolate and extend datasets into future periods. We evaluate the performance and reliability of EALA using a dataset of climate negotiations. Our findings demonstrate that EALA effectively predicts nuanced interactions between negotiation parties and captures the evolution of topics over time. At the same time, we identify several limitations inherent to LLM-based annotation, highlighting areas for further improvement. Given the wide availability of codebooks and annotated datasets, EALA holds substantial promise for advancing research in political science and beyond.

Problem

Research questions and friction points this paper is trying to address.

Automates annotation of evolving corpora for longitudinal network data.

Integrates LLMs and expert knowledge to predict dynamic interactions.

Addresses challenges in maintaining timely and consistent annotations.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines LLMs with expert knowledge for annotation

Extends datasets using historical annotations and codebooks

Evaluates performance on climate negotiation datasets

🔎 Similar Papers

Extracting Affect Aggregates from Longitudinal Social Media Data with Temporal Adapters for Large Language Models

2024-09-26Citations: 0

Prompt Selection Matters: Enhancing Text Annotations for Social Sciences with Large Language Models

2024-07-15arXiv.orgCitations: 0

Authors to Follow