Managing Large Enclaves in a Data Center

๐Ÿ“… 2023-11-13
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address excessive downtime during migration of large trusted execution environment (TEE) enclavesโ€”such as Intel SGXโ€”in data centers, caused by conventional stop-copy-resume mechanisms, this paper proposes OptMig: the first near-zero-downtime secure migration framework explicitly designed for TEE-invisible memory constraints. Methodologically, OptMig integrates lightweight compiler instrumentation with a hardware-agnostic, incremental page-level write-capture mechanism, enabling verifiable secure state tracking and migration without modifying TEE firmware or relying on external monitoring. By extending the SGX runtime and supporting container, VM, and microVM platforms, OptMig achieves 77โ€“96% reduction in migration downtime for multi-gigabyte enclaves. The framework is validated end-to-end in a real cloud environment, demonstrating practical feasibility and security guarantees under production workloads.
๐Ÿ“ Abstract
Live migration of applications and VMs in data centers is an old and quintessential problem. In this large body of work, an important open problem still remains, which is the migration of secure enclaves (sandboxes) running on trusted execution environments (TEEs) like Intel SGX. Here, the decade-old stop-and-copy-based method is used, in which the entire application`s execution is stopped and the state is collected and transferred. This method has an exceedingly long downtime when we consider enclaves with large memory footprints. Better solutions have eluded us because of some design limitations posed by TEEs like Intel SGX, such as the opacity of data within enclaves (not visible to the OS/hypervisor) and the lack of mechanisms to track writes on secure pages. We propose a new technique, OptMig, to circumvent these limitations and implement secure enclave migration with a near-zero downtime. We rely on a short compiler pass and propose a novel migration mechanism. Our optimizations reduce the total downtime by 77-96% for a suite of Intel SGX applications that have multi-GB memory footprints. We show results for our system on a real cloud and in settings that use containers, VMs, and microVMs
Problem

Research questions and friction points this paper is trying to address.

Intel SGX
Secure Enclave Migration
Program Interruption
Innovation

Methods, ideas, or system contributions that make the work stand out.

OptMig
Intel SGX
Memory Migration Optimization
๐Ÿ”Ž Similar Papers
No similar papers found.
S
Sandeep Kumar
School of Information Technology, IIT Delhi, New Delhi, India
S
S. Sarangi
Department of Computer Science and Engineering, IIT Delhi, New Delhi, India