Entropy-Guided Attention for Private LLMs

📅 2025-01-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address high communication overhead and latency bottlenecks in private inference (PI) on encrypted data, as well as “entropy collapse” and “entropy overload”—nonlinearity-induced instability issues in large language model (LLM) training—this paper proposes an information-theoretic attention design paradigm. We quantify the impact of nonlinearity on attention head diversity using Shannon entropy, revealing its dual role in training stability and output distribution breadth. Building on this insight, we introduce entropy regularization, a PI-compatible layer normalization alternative, and a low-nonlinearity Transformer decoder architecture. Experiments demonstrate substantial improvements in attention head utilization and training stability under encryption, while effectively mitigating entropy imbalance. Our approach establishes a novel architectural foundation for efficient, scalable private LLM inference.

Technology Category

Application Category

📝 Abstract
The pervasiveness of proprietary language models has raised critical privacy concerns, necessitating advancements in private inference (PI), where computations are performed directly on encrypted data without revealing users' sensitive information. While PI offers a promising solution, its practical deployment is hindered by substantial communication and latency overheads, primarily stemming from nonlinear operations. To address this, we introduce an information-theoretic framework to characterize the role of nonlinearities in decoder-only language models, laying a principled foundation for optimizing transformer-architectures tailored to the demands of PI. By leveraging Shannon's entropy as a quantitative measure, we uncover the previously unexplored dual significance of nonlinearities: beyond ensuring training stability, they are crucial for maintaining attention head diversity. Specifically, we find that their removal triggers two critical failure modes: {em entropy collapse} in deeper layers that destabilizes training, and {em entropic overload} in earlier layers that leads to under-utilization of Multi-Head Attention's (MHA) representational capacity. We propose an entropy-guided attention mechanism paired with a novel entropy regularization technique to mitigate entropic overload. Additionally, we explore PI-friendly alternatives to layer normalization for preventing entropy collapse and stabilizing the training of LLMs with reduced-nonlinearities. Our study bridges the gap between information theory and architectural design, establishing entropy dynamics as a principled guide for developing efficient PI architectures. The code and implementation are available at href{https://github.com/Nandan91/entropy-guided-attention-llm}{entropy-guided-llm}.
Problem

Research questions and friction points this paper is trying to address.

Privacy-Preserving Computation
Differential Privacy
Large Language Model Training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropy-guided Attention
Entropy Regularization
Privacy Inference Optimization