Fast and Optimal Differentially Private Frequent-Substring Mining

📅 2026-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of efficiently mining all frequent substrings from user-submitted string collections under ε-differential privacy. To this end, the authors propose a top-down candidate exploration strategy that leverages structural properties of frequent prefixes and suffixes to optimize candidate generation. A frequency-based pruning mechanism is further introduced to dramatically reduce the search space. The proposed method overcomes the quadratic complexity bottleneck of existing approaches, achieving a time complexity of O(nℓ log|Σ| + |Σ|) and reducing space complexity from O(n²ℓ⁴) to O(nℓ + |Σ|), while maintaining an approximately optimal error bound. This advancement significantly enhances both scalability and computational efficiency for private frequent substring mining.

Technology Category

Application Category

📝 Abstract
Given a dataset of $n$ user-contributed strings, each of length at most $\ell$, a key problem is how to identify all frequent substrings while preserving each user's privacy. Recent work by Bernardini et al. (PODS'25) introduced a $\varepsilon$-differentially private algorithm achieving near-optimal error, but at the prohibitive cost of $O(n^2\ell^4)$ space and processing time. In this work, we present a new $\varepsilon$-differentially private algorithm that retains the same near-optimal error guarantees while reducing space complexity to $O(n \ell+ |Σ| )$ and time complexity to $O(n \ell\log |Σ| + |Σ| )$, for input alphabet $Σ$. Our approach builds on a top-down exploration of candidate substrings but introduces two new innovations: (i) a refined candidate-generation strategy that leverages the structural properties of frequent prefixes and suffixes, and (ii) pruning of the search space guided by frequency relations. These techniques eliminate the quadratic blow-ups inherent in prior work, enabling scalable frequent substring mining under differential privacy.
Problem

Research questions and friction points this paper is trying to address.

differential privacy
frequent substring mining
privacy-preserving data mining
string algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

differential privacy
frequent substring mining
space efficiency
time complexity
candidate pruning
🔎 Similar Papers
No similar papers found.
P
Peaker Guo
Institute of Science Tokyo, Japan
Rayne Holland
Rayne Holland
Postdoctoral Fellow, CSIRO
data privacynetwork securitydata structures
H
Hao Wu
University of Waterloo, Canada