Private Transformer Inference in MLaaS: A Survey

📅 2025-05-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Transformer-based inference in Machine Learning as a Service (MLaaS) risks dual privacy leakage—exposing both user data and proprietary model parameters. Method: We propose the first structured taxonomy and unified evaluation framework for Private Transformer Inference (PTI), systematically balancing privacy guarantees, computational overhead, and inference accuracy. Our approach integrates secure multi-party computation, homomorphic encryption, and hybrid privacy protocols to enable end-to-end encrypted inference. We comprehensively survey representative PTI solutions from 2020–2025, identifying critical efficiency bottlenecks and delineating practical deployment pathways. Contribution/Results: This work delivers a technically viable roadmap and standardized evaluation benchmark for privacy-preserving large language model (LLM) services. It establishes foundational principles for quantifying trade-offs among security, efficiency, and utility—enabling rigorous, reproducible assessment of PTI systems and accelerating real-world adoption of confidential MLaaS.

Technology Category

Application Category

📝 Abstract
Transformer models have revolutionized AI, powering applications like content generation and sentiment analysis. However, their deployment in Machine Learning as a Service (MLaaS) raises significant privacy concerns, primarily due to the centralized processing of sensitive user data. Private Transformer Inference (PTI) offers a solution by utilizing cryptographic techniques such as secure multi-party computation and homomorphic encryption, enabling inference while preserving both user data and model privacy. This paper reviews recent PTI advancements, highlighting state-of-the-art solutions and challenges. We also introduce a structured taxonomy and evaluation framework for PTI, focusing on balancing resource efficiency with privacy and bridging the gap between high-performance inference and data privacy.
Problem

Research questions and friction points this paper is trying to address.

Addressing privacy concerns in Transformer-based MLaaS deployment
Exploring cryptographic techniques for private Transformer inference
Balancing resource efficiency with privacy in PTI solutions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses secure multi-party computation
Employs homomorphic encryption techniques
Balances resource efficiency with privacy
Y
Yang Li
Energy Research Institute @ NTU, Interdisciplinary Graduate Programme, Nanyang Technological University, Singapore; College of Computing and Data Science, Nanyang Technological University, Singapore
X
Xinyu Zhou
Energy Research Institute @ NTU, Interdisciplinary Graduate Programme, Nanyang Technological University, Singapore; College of Computing and Data Science, Nanyang Technological University, Singapore
Yitong Wang
Yitong Wang
ByteDance Inc.
computer vision
Liangxin Qian
Liangxin Qian
Nanyang Technological University, College of Computing and Data Science (CCDS)
Wireless CommunicationsConvex Optimization
J
Jun Zhao
College of Computing and Data Science, Nanyang Technological University, Singapore