🤖 AI Summary
To address the high computational overhead and memory bottlenecks of fully homomorphic encryption (FHE) in sparse feature embedding lookup for deep learning recommendation models (DLRMs), this paper proposes a privacy-preserving, end-to-end encrypted inference framework. Our method introduces: (1) an efficient compressed embedding lookup mechanism to reduce FHE ciphertext dimensionality; (2) a multi-embedding parallel packing strategy to improve ciphertext utilization; and (3) co-optimization of the MLP architecture to comply with FHE constraints. To our knowledge, this is the first work enabling scalable encrypted embedding lookup at the ten-million-parameter scale. Experiments on the UCI and Criteo datasets demonstrate a 77× speedup in embedding lookup over state-of-the-art FHE-based approaches, while fully supporting end-to-end encryption—from client-side input to output—throughout the recommendation inference pipeline.
📝 Abstract
Fully Homomorphic Encryption (FHE) is an encryption scheme that not only encrypts data but also allows for computations to be applied directly on the encrypted data. While computationally expensive, FHE can enable privacy-preserving neural inference in the client-server setting: a client encrypts their input with FHE and sends it to an untrusted server. The server then runs neural inference on the encrypted data and returns the encrypted results. The client decrypts the output locally, keeping both the input and result private from the server. Private inference has focused on networks with dense inputs such as image classification, and less attention has been given to networks with sparse features. Unlike dense inputs, sparse features require efficient encrypted lookup operations into large embedding tables, which present computational and memory constraints for FHE.
In this paper, we explore the challenges and opportunities when applying FHE to Deep Learning Recommendation Models (DLRM) from both a compiler and systems perspective. DLRMs utilize conventional MLPs for dense features and embedding tables to map sparse, categorical features to dense vector representations. We develop novel methods for performing compressed embedding lookups in order to reduce FHE computational costs while keeping the underlying model performant. Our embedding lookup improves upon a state-of-the-art approach by $77 imes$. Furthermore, we present an efficient multi-embedding packing strategy that enables us to perform a 44 million parameter embedding lookup under FHE. Finally, we integrate our solutions into the open-source Orion framework and present HE-LRM, an end-to-end encrypted DLRM. We evaluate HE-LRM on UCI (health prediction) and Criteo (click prediction), demonstrating that with the right compression and packing strategies, encrypted inference for recommendation systems is practical.