Fast and Optimal Differentially Private Frequent-Substring Mining

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

career value

252K/year

🤖 AI Summary

This work addresses the problem of efficiently mining all frequent substrings from user-submitted string collections under ε-differential privacy. To this end, the authors propose a top-down candidate exploration strategy that leverages structural properties of frequent prefixes and suffixes to optimize candidate generation. A frequency-based pruning mechanism is further introduced to dramatically reduce the search space. The proposed method overcomes the quadratic complexity bottleneck of existing approaches, achieving a time complexity of O(nℓ log|Σ| + |Σ|) and reducing space complexity from O(n²ℓ⁴) to O(nℓ + |Σ|), while maintaining an approximately optimal error bound. This advancement significantly enhances both scalability and computational efficiency for private frequent substring mining.

Technology Category

Application Category

📝 Abstract

Given a dataset of $n$ user-contributed strings, each of length at most $\ell$, a key problem is how to identify all frequent substrings while preserving each user's privacy. Recent work by Bernardini et al. (PODS'25) introduced a $\varepsilon$-differentially private algorithm achieving near-optimal error, but at the prohibitive cost of $O(n^2\ell^4)$ space and processing time. In this work, we present a new $\varepsilon$-differentially private algorithm that retains the same near-optimal error guarantees while reducing space complexity to $O(n \ell+ |Σ| )$ and time complexity to $O(n \ell\log |Σ| + |Σ| )$, for input alphabet $Σ$. Our approach builds on a top-down exploration of candidate substrings but introduces two new innovations: (i) a refined candidate-generation strategy that leverages the structural properties of frequent prefixes and suffixes, and (ii) pruning of the search space guided by frequency relations. These techniques eliminate the quadratic blow-ups inherent in prior work, enabling scalable frequent substring mining under differential privacy.

Problem

Research questions and friction points this paper is trying to address.

differential privacy

frequent substring mining

privacy-preserving data mining

string algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

differential privacy

frequent substring mining

space efficiency

time complexity