π€ AI Summary
To address performance bottlenecks and architectural fragmentation arising from the mismatch between GPU-accelerated model training and CPU-centric secure verification in distributed machine learning, this paper proposes the first GPU-native encrypted integrity verification framework. Methodologically, it deeply integrates cryptographic verification logic into the GPU execution pipeline, leverages specialized hardware units (e.g., XMX/Tensor Cores) for parallelized cryptographic computation, and introduces a hardware-agnostic unified verification primitive alongside a cross-vendor compatible execution architecture. Key contributions include: (1) the first GPU-native, high-throughput integrity verification; (2) low-overhead verification for models exceeding 100 GBβachieving β₯8.2Γ speedup over CPU-based approaches; and (3) a reserved secure channel mechanism for seamless integration with trusted execution environments (TEEs). The framework significantly enhances both security guarantees and system efficiency in heterogeneous distributed training infrastructures.
π Abstract
We present a security framework that strengthens distributed machine learning by standardizing integrity protections across CPU and GPU platforms and significantly reducing verification overheads. Our approach co-locates integrity verification directly with large ML model execution on GPU accelerators, resolving the fundamental mismatch between how large ML workloads typically run (primarily on GPUs) and how security verifications traditionally operate (on separate CPU-based processes), delivering both immediate performance benefits and long-term architectural consistency. By performing cryptographic operations natively on GPUs using dedicated compute units (e.g., Intel Arc's XMX units, NVIDIA's Tensor Cores), our solution eliminates the potential architectural bottlenecks that could plague traditional CPU-based verification systems when dealing with large models. This approach leverages the same GPU-based high-memory bandwidth and parallel processing primitives that power ML workloads ensuring integrity checks keep pace with model execution even for massive models exceeding 100GB. This framework establishes a common integrity verification mechanism that works consistently across different GPU vendors and hardware configurations. By anticipating future capabilities for creating secure channels between trusted execution environments and GPU accelerators, we provide a hardware-agnostic foundation that enterprise teams can deploy regardless of their underlying CPU and GPU infrastructures.