A Longitudinal Measurement of Privacy Policy Evolution for Large Language Models

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates transparency gaps in privacy practices of large language model (LLM) service providers by analyzing the content, temporal evolution, and divergence from conventional web/mobile privacy policies. Method: Leveraging a longitudinal dataset comprising 74 historical privacy policies and 115 supplementary documents, we conduct sentence-level revision analysis across over 3,000 policy statements and introduce the first LLM-specific privacy policy taxonomy. Contribution/Results: Findings reveal pervasive verbosity, ambiguity, and low readability; updates concentrate on first-party data collection and user-specific terms; disclosure comprehensiveness is higher in North America and Europe; policy revisions are primarily triggered by product launches and regulatory interventions, with significant cross-regional disparities. Collectively, results demonstrate systemic opacity in LLM privacy governance, offering empirical evidence and a methodological framework for AI policy assessment and regulatory oversight.

Technology Category

Application Category

📝 Abstract

Large language model (LLM) services have been rapidly integrated into people's daily lives as chatbots and agentic systems. They are nourished by collecting rich streams of data, raising privacy concerns around excessive collection of sensitive personal information. Privacy policies are the fundamental mechanism for informing users about data practices in modern information privacy paradigm. Although traditional web and mobile policies are well studied, the privacy policies of LLM providers, their LLM-specific content, and their evolution over time remain largely underexplored. In this paper, we present the first longitudinal empirical study of privacy policies for mainstream LLM providers worldwide. We curate a chronological dataset of 74 historical privacy policies and 115 supplemental privacy documents from 11 LLM providers across 5 countries up to August 2025, and extract over 3,000 sentence-level edits between consecutive policy versions. We compare LLM privacy policies to those of other software formats, propose a taxonomy tailored to LLM privacy policies, annotate policy edits and align them with a timeline of key LLM ecosystem events. Results show they are substantially longer, demand college-level reading ability, and remain highly vague. Our taxonomy analysis reveals patterns in how providers disclose LLM-specific practices and highlights regional disparities in coverage. Policy edits are concentrated in first-party data collection and international/specific-audience sections, and that product releases and regulatory actions are the primary drivers, shedding light on the status quo and the evolution of LLM privacy policies.

Problem

Research questions and friction points this paper is trying to address.

Analyzes evolution of privacy policies for large language models over time

Compares LLM privacy policies to other software formats and regional differences

Identifies key drivers and patterns in policy edits and disclosure practices

Innovation

Methods, ideas, or system contributions that make the work stand out.

Longitudinal dataset of 74 historical privacy policies

Taxonomy tailored to LLM-specific privacy practices

Analysis of 3000 sentence-level edits across versions

🔎 Similar Papers

No similar papers found.

Authors to Follow