LaMoS: Enabling Efficient Large Number Modular Multiplication through SRAM-based CiM Acceleration

📅 2025-11-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing in-memory computing (CiM) architectures struggle to efficiently support large-integer modular multiplication—critical for cryptographic primitives such as RSA and ECC—due to two key limitations: (1) inherent bias toward low-bitwidth operations, hindering scalability to high-precision arithmetic; and (2) reliance on inefficient in-memory logic, resulting in high latency and substantial area overhead for wide-bit operands. This work presents the first mapping of the Barrett modular reduction algorithm onto an SRAM-based CiM architecture. We propose a workload-partitioning–driven, customized dataflow and computational optimization strategy to enhance scalability and energy efficiency at high bitwidths. Experimental results demonstrate that our design achieves a 7.02× speedup over state-of-the-art SRAM CiM accelerators, while significantly reducing both latency and area cost. The approach establishes a scalable, hardware-efficient paradigm for accelerating cryptographic workloads in memory.

Technology Category

Application Category

📝 Abstract
Barrett's algorithm is one of the most widely used methods for performing modular multiplication, a critical nonlinear operation in modern privacy computing techniques such as homomorphic encryption (HE) and zero-knowledge proofs (ZKP). Since modular multiplication dominates the processing time in these applications, computational complexity and memory limitations significantly impact performance. Computing-in-Memory (CiM) is a promising approach to tackle this problem. However, existing schemes currently suffer from two main problems: 1) Most works focus on low bit-width modular multiplication, which is inadequate for mainstream cryptographic algorithms such as elliptic curve cryptography (ECC) and the RSA algorithm, both of which require high bit-width operations; 2) Recent efforts targeting large number modular multiplication rely on inefficient in-memory logic operations, resulting in high scaling costs for larger bit-widths and increased latency. To address these issues, we propose LaMoS, an efficient SRAM-based CiM design for large-number modular multiplication, offering high scalability and area efficiency. First, we analyze the Barrett's modular multiplication method and map the workload onto SRAM CiM macros for high bit-width cases. Additionally, we develop an efficient CiM architecture and dataflow to optimize large-number modular multiplication. Finally, we refine the mapping scheme for better scalability in high bit-width scenarios using workload grouping. Experimental results show that LaMoS achieves a $7.02 imes$ speedup and reduces high bit-width scaling costs compared to existing SRAM-based CiM designs.
Problem

Research questions and friction points this paper is trying to address.

Accelerating high bit-width modular multiplication for cryptography
Overcoming inefficient in-memory logic operations for large numbers
Reducing computational complexity in homomorphic encryption and ZKP
Innovation

Methods, ideas, or system contributions that make the work stand out.

SRAM-based CiM design for large-number modular multiplication
Optimized Barrett algorithm mapping onto CiM macros
Workload grouping for high bit-width scalability
🔎 Similar Papers
No similar papers found.
Haomin Li
Haomin Li
The Children's Hospital, Zhejiang University School of Medicine
Medical InformaticsClincial Decision SupportMedical AI
Fangxin Liu
Fangxin Liu
Shanghai Jiao Tong University
In-memory Computing、Brian-inspired Neuromorphic Computing
C
Chenyang Guan
School of Computer Science, Shanghai Jiao Tong University
Z
Zongwu Wang
School of Computer Science, Shanghai Jiao Tong University; Shanghai Qi Zhi Institute
L
Li Jiang
School of Computer Science, Shanghai Jiao Tong University; Shanghai Qi Zhi Institute
H
Haibing Guan
School of Computer Science, Shanghai Jiao Tong University