Long Context, Less Focus: A Scaling Gap in LLMs Revealed through Privacy and Personalization

📅 2026-02-16

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates the trade-off between personalization effectiveness and privacy preservation in large language models operating under long-context scenarios. To this end, we introduce PAPerBench, a large-scale benchmark encompassing context lengths from 1K to 256K and 377K multi-scenario evaluation queries. Through controlled experiments and theoretical analysis grounded in soft attention mechanisms, we uncover and formally explain the “long-context, low-focus” scaling gap—a phenomenon wherein both personalization performance and privacy risk decline simultaneously as context length increases. We identify attention dilution as the fundamental cause of this dual degradation. Our findings offer novel insights and a foundational direction for developing scalable, privacy-aware personalized language models.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are increasingly deployed in privacy-critical and personalization-oriented scenarios, yet the role of context length in shaping privacy leakage and personalization effectiveness remains largely unexplored. We introduce a large-scale benchmark, PAPerBench, to systematically study how increasing context length influences both personalization quality and privacy protection in LLMs. The benchmark comprises approximately 29,000 instances with context lengths ranging from 1K to 256K tokens, yielding a total of 377K evaluation questions. It jointly evaluates personalization performance and privacy risks across diverse scenarios, enabling controlled analysis of long-context model behavior. Extensive evaluations across state-of-the-art LLMs reveal consistent performance degradation in both personalization and privacy as context length increases. We further provide a theoretical analysis of attention dilution under context scaling, explaining this behavior as an inherent limitation of soft attention in fixed-capacity Transformers. The empirical and theoretical findings together suggest a general scaling gap in current models -- long context, less focus. We release the benchmark to support reproducible evaluation and future research on scalable privacy and personalization. Code and data are available at https://github.com/SafeRL-Lab/PAPerBench

Problem

Research questions and friction points this paper is trying to address.

context length

privacy leakage

personalization

large language models

scaling gap

Innovation

Methods, ideas, or system contributions that make the work stand out.

long-context LLMs

privacy leakage

personalization

attention dilution

scaling gap

🔎 Similar Papers

No similar papers found.

Authors to Follow