GPU Acceleration of Learning With Errors KEMs Using OpenACC for Post-Quantum Cryptography

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

248K/year

🤖 AI Summary

This work addresses the high computational overhead and limited scalability of Learning-with-Errors (LWE)-based post-quantum key encapsulation mechanisms (KEMs) on conventional CPUs by introducing, for the first time, GPU acceleration using the OpenACC parallel programming model. The implementation is optimized for multiple generations of NVIDIA GPU architectures, targeting core LWE-KEM operations, and supports both bare-metal and containerized deployment. Evaluated on the NVIDIA Grace Hopper superchip, the approach achieves up to a 208× speedup and approximately 2× better energy efficiency compared to a multithreaded CPU baseline. By overcoming CPU memory bandwidth and synchronization bottlenecks, this study significantly enhances the practicality and scalability of LWE-based KEMs in real-world applications.

📝 Abstract

Shor's algorithm proved that asymmetric cryptographic protocols based on the integer factorization and discrete logarithm problems are no longer safe in a world with large-scale quantum computers. As a result, Post-Quantum Cryptography (PQC) has been developed over the last few years, seeking cryptographic primitives resistant to quantum attacks. One of the main hard problems underlying PQC schemes is the Learning with Errors (LWE) problem, which is significantly more computationally intensive than its classical predecessors. In this work, we present a Key Encapsulation Mechanism (KEM) based on plain LWE and develop a GPU-oriented implementation using OpenACC. We evaluate the performance of our accelerated application in terms of both time-to-solution and energy-to-solution, considering bare-metal and containerized executions across multiple NVIDIA GPU models and generations. Our implementation achieves significant acceleration across all tested GPU platforms. In particular, on the NVIDIA Grace Hopper Superchip, it attains up to a $208\times$ speedup over a multithreaded CPU baseline and enables the execution of problem sizes that are impractical on CPU architectures due to memory and synchronization constraints. Energy consumption analysis also shows $\approx 2\times$ better efficiency when using the Superchip compared to systems equipped with x86-based CPUs and NVIDIA H100 GPUs. These results highlight the effectiveness of GPU acceleration for computationally demanding LWE-based cryptographic workloads.

Problem

Research questions and friction points this paper is trying to address.

Post-Quantum Cryptography

Learning with Errors

Key Encapsulation Mechanism

GPU Acceleration

Computational Intensity

Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU acceleration

Learning with Errors (LWE)

OpenACC