WHET: Welding Homomorphic Encryption to Accelerator Architectures

📅 2026-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Fully homomorphic encryption (FHE) remains challenging to execute efficiently due to its substantial computational and memory overhead, exacerbated by a longstanding disconnect between cryptographic optimizations and hardware design. This work proposes a memory-centric, architecture-aware hardware-software co-optimization approach that tightly integrates the CKKS scheme with accelerator design, drastically reducing off-chip memory accesses and temporary data storage. Key innovations include an accelerator-oriented fine-grained coefficient-to-slot transformation, plaintext compression, intermediate modulus upping, dedicated on-chip buffers, and extended functional units. These synergistic enhancements enable, for the first time, sub-millisecond CKKS bootstrapping. Compared to the state-of-the-art FHE accelerators, the proposed design achieves 1.38× to 8.74× higher performance per unit area.

📝 Abstract

Fully homomorphic encryption (FHE) enables computations on encrypted data without decryption, offering strong data privacy at the expense of substantial computational and memory overheads. Prior efforts have steadily improved FHE performance through cryptographic and algorithmic enhancements or hardware acceleration, yet these two directions have progressed largely in isolation, hindering the full exploitation of available hardware capabilities. This work presents WHET, which introduces memory-centric, architecture-aware optimizations to better align cryptographic and algorithmic constructions with FHE accelerator architectures. We identify conventional FHE constructions as major sources of excessive working sets and heavy off-chip memory traffic. We propose accelerator-specific techniques, including fine-grained coefficient-to-slot transformation, plaintext compression, and intermediate modulus raising, to reduce the on-chip data footprint by minimizing temporary ciphertexts and plaintext loads. With these techniques applied, we observe additional opportunities to improve on-chip memory efficiency; hence, we introduce lightweight architectural refinements, including a special-purpose buffer and functional unit extensions. With these optimizations, WHET achieves 1.38-8.74$\times$ per-area performance improvements over state-of-the-art FHE accelerators and the first-ever sub-millisecond CKKS bootstrapping.

Problem

Research questions and friction points this paper is trying to address.

Fully Homomorphic Encryption

Hardware Acceleration

Memory Overhead

Accelerator Architecture

CKKS Bootstrapping

Innovation

Methods, ideas, or system contributions that make the work stand out.

homomorphic encryption

hardware acceleration

memory-centric optimization