🤖 AI Summary
Traditional memory management in cloud environments suffers from high metadata overhead, architectural complexity, and poor stability; existing software- and hardware-based optimizations struggle to simultaneously achieve flexibility and low overhead. This paper proposes Vmem, a lightweight, online-upgradable memory management architecture. Vmem is the first production-ready solution enabling hot upgrades of the memory subsystem. It integrates lightweight reserved-memory management, VFIO-accelerated virtual machines, DPU-assisted offloading, and a dynamic upgrade mechanism. Experiments show that Vmem increases sellable memory ratio by ~2%, accelerates VFIO VM startup by over 3×, and improves VM network performance by ~10% under DPU acceleration. Deployed at scale across more than 300,000 cloud servers, Vmem robustly supports elastic scaling and rapid iteration requirements.
📝 Abstract
Traditional memory management suffers from metadata overhead, architectural complexity, and stability degradation, problems intensified in cloud environments. Existing software/hardware optimizations are insufficient for cloud computing's dual demands of flexibility and low overhead. This paper presents Vmem, a memory management architecture for in-production cloud environments that enables flexible, efficient cloud server memory utilization through lightweight reserved memory management. Vmem is the first such architecture to support online upgrades, meeting cloud requirements for high stability and rapid iterative evolution. Experiments show Vmem increases sellable memory rate by about 2%, delivers extreme elasticity and performance, achieves over 3x faster boot time for VFIO-based virtual machines (VMs), and improves network performance by about 10% for DPU-accelerated VMs. Vmem has been deployed at large scale for seven years, demonstrating efficiency and stability on over 300,000 cloud servers supporting hundreds of millions of VMs.