GPU Acceleration of Learning With Errors KEMs Using OpenACC for Post-Quantum Cryptography

📅 2026-05-31
📈 Citations: 0
Influential: 0
📄 PDF

career value

248K/year
🤖 AI Summary
This work addresses the high computational overhead and limited scalability of Learning-with-Errors (LWE)-based post-quantum key encapsulation mechanisms (KEMs) on conventional CPUs by introducing, for the first time, GPU acceleration using the OpenACC parallel programming model. The implementation is optimized for multiple generations of NVIDIA GPU architectures, targeting core LWE-KEM operations, and supports both bare-metal and containerized deployment. Evaluated on the NVIDIA Grace Hopper superchip, the approach achieves up to a 208× speedup and approximately 2× better energy efficiency compared to a multithreaded CPU baseline. By overcoming CPU memory bandwidth and synchronization bottlenecks, this study significantly enhances the practicality and scalability of LWE-based KEMs in real-world applications.
📝 Abstract
Shor's algorithm proved that asymmetric cryptographic protocols based on the integer factorization and discrete logarithm problems are no longer safe in a world with large-scale quantum computers. As a result, Post-Quantum Cryptography (PQC) has been developed over the last few years, seeking cryptographic primitives resistant to quantum attacks. One of the main hard problems underlying PQC schemes is the Learning with Errors (LWE) problem, which is significantly more computationally intensive than its classical predecessors. In this work, we present a Key Encapsulation Mechanism (KEM) based on plain LWE and develop a GPU-oriented implementation using OpenACC. We evaluate the performance of our accelerated application in terms of both time-to-solution and energy-to-solution, considering bare-metal and containerized executions across multiple NVIDIA GPU models and generations. Our implementation achieves significant acceleration across all tested GPU platforms. In particular, on the NVIDIA Grace Hopper Superchip, it attains up to a $208\times$ speedup over a multithreaded CPU baseline and enables the execution of problem sizes that are impractical on CPU architectures due to memory and synchronization constraints. Energy consumption analysis also shows $\approx 2\times$ better efficiency when using the Superchip compared to systems equipped with x86-based CPUs and NVIDIA H100 GPUs. These results highlight the effectiveness of GPU acceleration for computationally demanding LWE-based cryptographic workloads.
Problem

Research questions and friction points this paper is trying to address.

Post-Quantum Cryptography
Learning with Errors
Key Encapsulation Mechanism
GPU Acceleration
Computational Intensity
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU acceleration
Learning with Errors (LWE)
OpenACC
Post-Quantum Cryptography
Key Encapsulation Mechanism (KEM)
🔎 Similar Papers
T
Tiziana Liberati
E4 Computer Engineering SpA, via Martiri della Libertà 66, 42019 Scandiano (Italy)
N
Nitin Shukla
SuperComputing Applications and Innovation Department, Cineca, via Magnanelli 6/3, 40033 Bologna (Italy)
M
Matteo Barbieri
E4 Computer Engineering SpA, via Martiri della Libertà 66, 42019 Scandiano (Italy)
G
Gabriella Bettonte
E4 Computer Engineering SpA, via Martiri della Libertà 66, 42019 Scandiano (Italy)
E
Elisabetta Boella
E4 Computer Engineering SpA, via Martiri della Libertà 66, 42019 Scandiano (Italy)
S
Simone Rizzo
E4 Computer Engineering SpA, via Martiri della Libertà 66, 42019 Scandiano (Italy)
D
Daniele Gregori
E4 Computer Engineering SpA, via Martiri della Libertà 66, 42019 Scandiano (Italy)
Marco Pedicini
Marco Pedicini
Associate Professor of Logic in Computer Science, Roma Tre University
theoretical computer sciencecryptographysystems biologycomputational number theory