🤖 AI Summary
Transformer-based inference in Machine Learning as a Service (MLaaS) risks dual privacy leakage—exposing both user data and proprietary model parameters.
Method: We propose the first structured taxonomy and unified evaluation framework for Private Transformer Inference (PTI), systematically balancing privacy guarantees, computational overhead, and inference accuracy. Our approach integrates secure multi-party computation, homomorphic encryption, and hybrid privacy protocols to enable end-to-end encrypted inference. We comprehensively survey representative PTI solutions from 2020–2025, identifying critical efficiency bottlenecks and delineating practical deployment pathways.
Contribution/Results: This work delivers a technically viable roadmap and standardized evaluation benchmark for privacy-preserving large language model (LLM) services. It establishes foundational principles for quantifying trade-offs among security, efficiency, and utility—enabling rigorous, reproducible assessment of PTI systems and accelerating real-world adoption of confidential MLaaS.
📝 Abstract
Transformer models have revolutionized AI, powering applications like content generation and sentiment analysis. However, their deployment in Machine Learning as a Service (MLaaS) raises significant privacy concerns, primarily due to the centralized processing of sensitive user data. Private Transformer Inference (PTI) offers a solution by utilizing cryptographic techniques such as secure multi-party computation and homomorphic encryption, enabling inference while preserving both user data and model privacy. This paper reviews recent PTI advancements, highlighting state-of-the-art solutions and challenges. We also introduce a structured taxonomy and evaluation framework for PTI, focusing on balancing resource efficiency with privacy and bridging the gap between high-performance inference and data privacy.