🤖 AI Summary
This work addresses the limitations of existing cloud-based inference services, which rely on proprietary hardware and closed ecosystems, thereby failing to provide open, reproducible, and genuinely secure privacy guarantees. The paper presents the first fully open-source framework for confidential large language model inference that operates without dependence on proprietary hardware. Built upon commercially available trusted execution environments (TEEs), the proposed system implements an end-to-end secure architecture and demonstrates confidential inference for the Llama-3 8B model atop vLLM. Experimental evaluation of the open-source prototype confirms that the approach achieves strong privacy protection while maintaining practical feasibility and incurring only moderate, manageable performance overhead.
📝 Abstract
Generative AI applications such as personal AI agents, image generators, and chat assistants offer advanced capabilities to improve user experience. Behind the scenes, Large Language Models (LLMs) that power these services require a massive amount of computation and are usually deployed in the cloud, available as APIs, meaning that a user's request has to be sent to a Cloud Inference Service (CIS) for processing. However, the strong capabilities of LLM also mean that user's requests now contain much more personal sensitive or enterprise confidential information, demanding equally strong protection in CIS. While early industry efforts such as Apple Private Cloud Compute (PCC) and Google Private AI Compute have emerged to show the potential of secure CIS, they are not adoptable for deployment by others due to their reliance on proprietary hardware and closed ecosystem. In addition, they all suffer from their own design glitches that can undermine the ambitious goal of bringing in true privacy protection to end users. In this paper, we present our analysis of the fundamental requirements of building a secure yet open CIS. We then present OpenPCC, a Confidential CIS framework that does not rely on proprietary hardware but instead uses commercially available TEEs. We implement an open-source prototype and characterize it end-to-end on a Llama-3 8B vLLM workload, separating OpenPCC's own cost from the underlying TEE hardware. Our analysis and evaluation demonstrated the feasibility and security of the system.