🤖 AI Summary
To address the high computational overhead and practical deployment challenges of NIST-standardized post-quantum cryptography scheme HQC on CPU platforms, this paper proposes an algorithm-architecture co-optimization methodology. We introduce, for the first time, a systematic integration of sparse vector operation exploitation, AVX2 instruction-level acceleration, and lookup-table (LUT)-driven polynomial multiplication restructuring—targeting critical bottlenecks including syndrome computation and error recovery. Our approach features memory-aware LUT design, customized memory access patterns, and optimized hash computation to enhance end-to-end efficiency across key generation, encryption, and decryption. Evaluated on mainstream x86 processors, the implementation achieves an average 55% speedup over the official reference implementation—significantly outperforming prior work. This delivers a high-throughput, low-latency HQC implementation, advancing its practical adoption and engineering deployment.
📝 Abstract
As post-quantum cryptography (PQC) becomes increasingly critical for securing future communication systems, the performance overhead introduced by quantum-resistant algorithms presents a major computing challenge. HQC (Hamming Quasi-Cyclic) is a newly standardized code-based PQC scheme designed to replace classical key exchange methods. In this paper, we propose OptHQC, an optimized implementation of the HQC scheme to deliver high-performance cryptographic operations. Our approach provides a comprehensive analysis of each computational blocks in HQC and introduces optimizations across all three stages: key generation, encryption, and decryption. We first exploit data-level sparsity in vector multiplication to accelerate polynomial operations during vector generation. We then leverage instruction-level acceleration (e.g., AVX2) in hash computation to further improve performance. Last, we transform multiplication into lookup table indexing and optimize memory access patterns in syndrome computation and error vector recovery, which are the most computationally intensive operations in HQC. Overall, OptHQC achieves an average 55% speedup over the reference HQC implementation on CPU.