MorphisHash: Improving Space Efficiency of ShockHash for Minimal Perfect Hashing

📅 2025-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
ShockHash incurs substantial space overhead when constructing minimal perfect hash functions (MPHFs) due to redundant candidate position mappings. Method: This paper proposes MorphisHash—the first MPHF construction that exploits multiplicity redundancy in MPHF mappings to design a deterministic disambiguation mechanism, integrating dual-candidate graph modeling, greedy matching for disambiguation, bit-level compact encoding, and information-theoretic structural compression. Contribution/Results: Theoretical analysis shows MorphisHash saves Θ(ln n) bits per key asymptotically. Empirically, it reduces space overhead by up to 20× compared to ShockHash, achieves ~1.44 bits/key—approaching the information-theoretic lower bound—while maintaining O(1) query time. It significantly outperforms state-of-the-art methods including ShockHash in both space efficiency and practical performance.

Technology Category

Application Category

📝 Abstract
A minimal perfect hash function (MPHF) maps a set of n keys to unique positions {1, ..., n}. Representing an MPHF requires at least 1.44 bits per key. ShockHash is a technique to construct an MPHF and requires just slightly more space. It gives each key two pseudo random candidate positions. If each key can be mapped to one of its two candidate positions such that there is exactly one key mapped to each position, then an MPHF is found. If not, ShockHash repeats the process with a new set of random candidate positions. ShockHash has to store how many repetitions were required and for each key to which of the two candidate positions it is mapped. However, when a given set of candidate positions can be used as MPHF then there is not only one but multiple ways of mapping the keys to one of their candidate positions such that the mapping results in an MPHF. This redundancy makes up for the majority of the remaining space overhead in ShockHash. In this paper, we present MorphisHash which is a technique that almost completely eliminates this redundancy. Our theoretical result is that MorphisHash saves {Theta}(ln(n)) bits compared to ShockHash. This corresponds to a factor of 20 less space overhead in practice. The technique to accomplish this might be of a more general interest to compress data structures.
Problem

Research questions and friction points this paper is trying to address.

Improves space efficiency of minimal perfect hashing.
Reduces redundancy in ShockHash's candidate position mapping.
Achieves significant space savings with MorphisHash technique.
Innovation

Methods, ideas, or system contributions that make the work stand out.

MorphisHash reduces ShockHash space redundancy.
Uses multiple key mapping ways for efficiency.
Saves Θ(ln(n)) bits, cutting overhead significantly.
🔎 Similar Papers