🤖 AI Summary
This study investigates transparency gaps in privacy practices of large language model (LLM) service providers by analyzing the content, temporal evolution, and divergence from conventional web/mobile privacy policies. Method: Leveraging a longitudinal dataset comprising 74 historical privacy policies and 115 supplementary documents, we conduct sentence-level revision analysis across over 3,000 policy statements and introduce the first LLM-specific privacy policy taxonomy. Contribution/Results: Findings reveal pervasive verbosity, ambiguity, and low readability; updates concentrate on first-party data collection and user-specific terms; disclosure comprehensiveness is higher in North America and Europe; policy revisions are primarily triggered by product launches and regulatory interventions, with significant cross-regional disparities. Collectively, results demonstrate systemic opacity in LLM privacy governance, offering empirical evidence and a methodological framework for AI policy assessment and regulatory oversight.
📝 Abstract
Large language model (LLM) services have been rapidly integrated into people's daily lives as chatbots and agentic systems. They are nourished by collecting rich streams of data, raising privacy concerns around excessive collection of sensitive personal information. Privacy policies are the fundamental mechanism for informing users about data practices in modern information privacy paradigm. Although traditional web and mobile policies are well studied, the privacy policies of LLM providers, their LLM-specific content, and their evolution over time remain largely underexplored. In this paper, we present the first longitudinal empirical study of privacy policies for mainstream LLM providers worldwide. We curate a chronological dataset of 74 historical privacy policies and 115 supplemental privacy documents from 11 LLM providers across 5 countries up to August 2025, and extract over 3,000 sentence-level edits between consecutive policy versions. We compare LLM privacy policies to those of other software formats, propose a taxonomy tailored to LLM privacy policies, annotate policy edits and align them with a timeline of key LLM ecosystem events. Results show they are substantially longer, demand college-level reading ability, and remain highly vague. Our taxonomy analysis reveals patterns in how providers disclose LLM-specific practices and highlights regional disparities in coverage. Policy edits are concentrated in first-party data collection and international/specific-audience sections, and that product releases and regulatory actions are the primary drivers, shedding light on the status quo and the evolution of LLM privacy policies.