Private Transformer Inference in MLaaS: A Survey

📅 2025-05-15

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

Transformer-based inference in Machine Learning as a Service (MLaaS) risks dual privacy leakage—exposing both user data and proprietary model parameters. Method: We propose the first structured taxonomy and unified evaluation framework for Private Transformer Inference (PTI), systematically balancing privacy guarantees, computational overhead, and inference accuracy. Our approach integrates secure multi-party computation, homomorphic encryption, and hybrid privacy protocols to enable end-to-end encrypted inference. We comprehensively survey representative PTI solutions from 2020–2025, identifying critical efficiency bottlenecks and delineating practical deployment pathways. Contribution/Results: This work delivers a technically viable roadmap and standardized evaluation benchmark for privacy-preserving large language model (LLM) services. It establishes foundational principles for quantifying trade-offs among security, efficiency, and utility—enabling rigorous, reproducible assessment of PTI systems and accelerating real-world adoption of confidential MLaaS.

Technology Category

Application Category

📝 Abstract

Transformer models have revolutionized AI, powering applications like content generation and sentiment analysis. However, their deployment in Machine Learning as a Service (MLaaS) raises significant privacy concerns, primarily due to the centralized processing of sensitive user data. Private Transformer Inference (PTI) offers a solution by utilizing cryptographic techniques such as secure multi-party computation and homomorphic encryption, enabling inference while preserving both user data and model privacy. This paper reviews recent PTI advancements, highlighting state-of-the-art solutions and challenges. We also introduce a structured taxonomy and evaluation framework for PTI, focusing on balancing resource efficiency with privacy and bridging the gap between high-performance inference and data privacy.

Problem

Research questions and friction points this paper is trying to address.

Addressing privacy concerns in Transformer-based MLaaS deployment

Exploring cryptographic techniques for private Transformer inference

Balancing resource efficiency with privacy in PTI solutions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses secure multi-party computation

Employs homomorphic encryption techniques

Balances resource efficiency with privacy

🔎 Similar Papers

PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration