Privacy Guard & Token Parsimony by Prompt and Context Handling and LLM Routing

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the trade-off between cost and privacy in large-scale deployment of large language models (LLMs), where existing routing mechanisms often neglect prompt sensitivity, risking inadvertent leakage of sensitive information. To mitigate this, the authors propose the “Privacy Guardian” framework, which leverages a local small language model to perform context-aware abstract summarization and automated prompt optimization (APO), dynamically rerouting high-risk queries to zero-trust or NDA-compliant models. The framework introduces an inseparable paradigm that unifies context management with privacy preservation, establishing—for the first time—a mathematical duality between token efficiency and privacy protection through prompt decomposition, LIFO-based context compression, and bi-objective optimization. Empirical evaluation on a thousand-sample benchmark demonstrates a 45% reduction in overall cost, 100% de-identification of personally sensitive information, and user preference for APO-compressed responses in 85% of cases.
📝 Abstract
The large-scale adoption of Large Language Models (LLMs) forces a trade-off between operational cost (OpEx) and data privacy. Current routing frameworks reduce costs but ignore prompt sensitivity, exposing users and institutions to leakage risks towards third-party cloud providers. We formalise the "Inseparability Paradigm": advanced context management intrinsically coincides with privacy management. We propose a local "Privacy Guard" -- a holistic contextual observer powered by an on-premise Small Language Model (SLM) -- that performs abstractive summarisation and Automatic Prompt Optimisation (APO) to decompose prompts into focused sub-tasks, re-routing high-risk queries to Zero-Trust or NDA-covered models. This dual mechanism simultaneously eliminates sensitive inference vectors (Zero Leakage) and reduces cloud token payloads (OpEx Reduction). A LIFO-based context compacting mechanism further bounds working memory, limiting the emergent leakage surface. We validate the framework through a 2x2 benchmark (Lazy vs. Expert users; Personal vs. Institutional secrets) on a 1,000-sample dataset, achieving a 45% blended OpEx reduction, 100% redaction success on personal secrets, and -- via LLM-as-a-Judge evaluation -- an 85% preference rate for APO-compressed responses over raw baselines. Our results demonstrate that Token Parsimony and Zero Leakage are mathematically dual projections of the same contextual compression operator.
Problem

Research questions and friction points this paper is trying to address.

data privacy
operational cost
prompt sensitivity
information leakage
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Privacy Guard
Automatic Prompt Optimisation
Inseparability Paradigm
Token Parsimony
Zero Leakage