Vectorised Hashing Based on Bernstein-Rabin-Winograd Polynomials over Prime Order Fields

📅 2025-07-08

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This work addresses the suboptimal computational efficiency of Almost-XOR-Universal (AXU) hash functions on modern CPUs. We propose decBRWHash, a novel vectorized hashing scheme built upon Bernstein-Rabin-Winograd (BRW) polynomials. decBRWHash is the first to deeply integrate the BRW structure with *c*-way SIMD parallelization and features hand-optimized assembly for AVX2, operating over the prime fields ℤ_{2¹²⁷⁻¹} and ℤ_{2¹³⁰⁻⁵}. By introducing parameterized polynomial segmentation and vectorized polynomial evaluation, it significantly improves throughput across message lengths: achieving ~16% higher speed than Poly1305 for kilobyte-length messages and 23% for megabyte-length ones. The 4-way variant of decBRWHash delivers state-of-the-art AXU hashing performance on Intel processors.

Technology Category

Application Category

📝 Abstract

We introduce the new AXU hash function decBRWHash, which is parameterised by the positive integer $c$ and is based on Bernstein-Rabin-Winograd (BRW) polynomials. Choosing $c>1$ gives a hash function which can be implemented using $c$-way single instruction multiple data (SIMD) instructions. We report a set of very comprehensive hand optimised assembly implementations of 4-decBRWHash using avx2 SIMD instructions available on modern Intel processors. For comparison, we also report similar carefully optimised avx2 assembly implementations of polyHash, an AXU hash function based on usual polynomials. Our implementations are over prime order fields, specifically the primes $2^{127}-1$ and $2^{130}-5$. For the prime $2^{130}-5$, for avx2 implementations, compared to the famous Poly1305 hash function, 4-decBRWHash is faster for messages which are a few hundred bytes long and achieves a speed-up of about 16% for message lengths in a few kilobytes range and improves to a speed-up of about 23% for message lengths in a few megabytes range.

Problem

Research questions and friction points this paper is trying to address.

Develop SIMD-optimized AXU hash function decBRWHash

Compare decBRWHash performance with polyHash and Poly1305

Achieve speed improvements for varying message lengths

Innovation

Methods, ideas, or system contributions that make the work stand out.

SIMD-optimized decBRWHash using BRW polynomials

AVX2 assembly for 4-decBRWHash on prime fields

Faster than Poly1305 for varying message lengths

🔎 Similar Papers

BinomialHash: A Constant Time, Minimal Memory Consistent Hash Algorithm