Implementation and Optimization of HQC Decoding on NPU-Integrated Devices

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

200K/year
🤖 AI Summary
This work addresses the high computational overhead of HQC decoding on mobile and embedded devices, primarily caused by inefficient Reed–Muller and Reed–Solomon components. Targeting Hexagon processors with integrated NPUs, the study presents the first fully vectorized implementation of the entire HQC decoding pipeline. Key contributions include HVX-friendly designs for the Hadamard transform, peak selection, finite-field arithmetic, and syndrome computation, alongside optimized data layouts and execution patterns. Evaluated on the Snapdragon 8 Gen 2 platform, the proposed approach achieves up to an 18.13× improvement in energy efficiency compared to the baseline, substantially reducing both latency and power consumption while effectively offloading the CPU.
📝 Abstract
Hamming Quasi-Cyclic (HQC) has been selected by NIST for standardization as an additional code-based key-encapsulation mechanism, providing algorithmic diversity alongside lattice-based post-quantum cryptography. Efficient deployment of HQC on mobile and embedded platforms, however, requires careful optimization of its decoding procedure, whose Reed-Muller and Reed-Solomon components dominate the computational cost. This paper studies HQC decoding on Qualcomm Hexagon processors in NPU-integrated devices, focusing on the Hexagon Vector eXtensions (HVX) backend rather than a tensor-inference engine. We observe that HQC decoding naturally exposes vector-structured computation, including Reed-Muller reliability vectors, Hadamard-transform coefficients, Reed-Solomon syndrome vectors, finite-field products, and packed support-point evaluations. Based on this observation, we redesign the dominant decoding kernels around HVX-friendly data layouts and execution patterns, including a vectorized Reed-Muller Hadamard transform, scalar-equivalent peak selection, HVX-oriented finite-field arithmetic, vectorized syndrome computation, and shortened-support locator-root evaluation. We implement and evaluate the optimized decoder using both Hexagon simulator measurements and real-device experiments on a Snapdragon~8 Gen~2 hardware development kit. The results show that Hexagon/HVX-assisted decoding substantially reduces latency and energy consumption, improving energy efficiency by up to $18.13\times$ while significantly offloading host CPU work. These results indicate that NPU-integrated mobile platforms can serve as effective backends for structured post-quantum cryptographic decoding when the underlying kernels are reformulated around vector execution.
Problem

Research questions and friction points this paper is trying to address.

HQC decoding
NPU-integrated devices
post-quantum cryptography
vectorized computation
energy efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

HQC decoding
HVX vectorization
post-quantum cryptography
NPU-integrated devices
finite-field arithmetic
🔎 Similar Papers
No similar papers found.
V
Vu Minh Chau
School of Information and Communication Technology, Hanoi University of Science and Technology, Vietnam
N
Nguyen Ngoc Kiet
School of Information and Communication Technology, Hanoi University of Science and Technology, Vietnam
P
Pham Quang Minh
School of Information and Communication Technology, Hanoi University of Science and Technology, Vietnam
M
Mai Xuan Ngoc
School of Information and Communication Technology, Hanoi University of Science and Technology, Vietnam
N
Nguyen Duc Anh
School of Information and Communication Technology, Hanoi University of Science and Technology, Vietnam
Hoang Ta
Hoang Ta
National University of Singapore
CombinatoricsQuantum information theoryOptimization