🤖 AI Summary
This work addresses the problem of efficiently mining all frequent substrings from user-submitted string collections under ε-differential privacy. To this end, the authors propose a top-down candidate exploration strategy that leverages structural properties of frequent prefixes and suffixes to optimize candidate generation. A frequency-based pruning mechanism is further introduced to dramatically reduce the search space. The proposed method overcomes the quadratic complexity bottleneck of existing approaches, achieving a time complexity of O(nℓ log|Σ| + |Σ|) and reducing space complexity from O(n²ℓ⁴) to O(nℓ + |Σ|), while maintaining an approximately optimal error bound. This advancement significantly enhances both scalability and computational efficiency for private frequent substring mining.
📝 Abstract
Given a dataset of $n$ user-contributed strings, each of length at most $\ell$, a key problem is how to identify all frequent substrings while preserving each user's privacy. Recent work by Bernardini et al. (PODS'25) introduced a $\varepsilon$-differentially private algorithm achieving near-optimal error, but at the prohibitive cost of $O(n^2\ell^4)$ space and processing time. In this work, we present a new $\varepsilon$-differentially private algorithm that retains the same near-optimal error guarantees while reducing space complexity to $O(n \ell+ |Σ| )$ and time complexity to $O(n \ell\log |Σ| + |Σ| )$, for input alphabet $Σ$. Our approach builds on a top-down exploration of candidate substrings but introduces two new innovations: (i) a refined candidate-generation strategy that leverages the structural properties of frequent prefixes and suffixes, and (ii) pruning of the search space guided by frequency relations. These techniques eliminate the quadratic blow-ups inherent in prior work, enabling scalable frequent substring mining under differential privacy.