π€ AI Summary
This work addresses the lack of a deep understanding of the hardware-system-level performance trade-offs between Number Theoretic Transform (NTT) and SumCheck within zero-knowledge proof systems. For the first time, it systematically evaluates the efficiency of these two proof primitives under identical on-chip SRAM and off-chip bandwidth constraints, using a unified hardware architecture based on the ZeroCheck protocol. By implementing customized hardware accelerators optimized for both NTT and SumCheck, the study reveals that SumCheck excels in high-degree polynomial settings, whereas NTT demonstrates superior performance for low-degree polynomials, moderate-scale computations, and SRAM-constrained environments due to its efficient data reuse. These findings provide critical insights for guiding hardware design choices in zero-knowledge proof systems.
π Abstract
In the ZKP community, it has long been discussed that the SumCheck protocol is asymptotically more efficient than the Number Theoretic Transform (NTT), requiring only $O(N)$ arithmetic versus $O(N \log N)$. At the same time, hardware accelerator designers propose that NTT is more hardware-friendly, benefiting from locality and data reuse, while SumCheck suffers from sequential, dependent rounds. Despite these competing intuitions, the hardware-system-level trade-offs between NTT- and SumCheck-based proving primitives remain insufficiently understood.
Beyond individual accelerator design, this work presents, to our knowledge, the first hardware-system-level direct comparison of NTT- and SumCheck-based proving primitives under a unified architectural framework. We study them in the context of the ZeroCheck protocol, a common building block in zkSNARKs. We implement optimized systems for both primitives. Both are evaluated under the same level on-chip SRAM and off-chip bandwidth budgets. Our results show that there is no universal winner. Generally, SumCheck outperforms NTT for high-degree polynomials. For low-degree polynomials, performance depends on memory availability: under given SRAM budgets, NTT might deliver better performance for medium-sized workloads by exploiting data reuse.
These findings, bridging cryptographic protocol design and hardware architecture, offer practical guidance for understanding the proving cost of NTT- and SumCheck-based zero-knowledge proof systems.