AgileOS: A GPU Operating System Layer for Protected CUDA Services

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the absence of system-level security mechanisms in the current CUDA programming model, which leaves GPU service metadata, device queues, MMIO regions, and library-internal states vulnerable to direct access by untrusted kernels. To mitigate this, the paper presents the first GPU operating system–level architecture designed for protected CUDA services. The approach virtualizes CUDA at the library boundary through a cooperative mechanism between client-side interceptors and a trusted runtime worker, enabling unified management of genuine execution contexts and mediating all GPU operations. The architecture incorporates modular design principles, a memory isolation model, and PTX injection techniques to enforce fine-grained access control and encapsulate services. A prototype implementation demonstrates compatibility with mainstream libraries such as cuFFT and PyTorch, effectively isolating user code from protected memory regions while preserving application compatibility and significantly enhancing GPU service security.

📝 Abstract

Modern GPU applications increasingly interact with storage systems, network devices, vendor libraries, and GPU-resident services rather than executing only isolated compute kernels. This shift creates a need for operating-system-like protection around GPU services, where service metadata, device queues, memory-mapped I/O regions, and library-internal state should not be directly exposed to untrusted application kernels. However, today's CUDA programming model, by default, still gives each application direct ownership of its CUDA context, device pointers, runtime handles, module loading path, and kernel launches, leaving protected GPU services to build their own ad hoc interfaces and isolation mechanisms. This paper presents the initial design and prototype scope of AgileOS, a GPU operating-system layer for protected CUDA services. AgileOS virtualizes CUDA at the library boundary: applications link against client-side CUDA Runtime, Driver, and selected library shims, while a trusted runtime worker owns the real CUDA context and mediates supported operations. To protect service state and module interfaces, AgileOS also defines a GPU memory-management model that separates user allocations from protected module/MMIO ranges, using pointer validation and memory access guards via PTX injection. AgileOS is modularized and flexible, supporting a range of protected services and existing libraries such as cuFFT and PyTorch. The prototype includes client-side interceptors, worker-side CUDA handlers, virtualized CUDA object tables, protected AgileOS modules, a GPU memory manager that separates user allocations from protected module/MMIO ranges, selected trusted library adapters, and the PTX-level kernel memory guard.

Problem

Research questions and friction points this paper is trying to address.

GPU protection

CUDA services

memory isolation

operating system layer

secure GPU computing

Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU operating system

CUDA virtualization

memory protection