Reconstructing Sets of Strings from Their k-way Projections: Algorithms & Complexity

📅 2025-11-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates the *k*-way projection reconstruction problem for string sets: given unlabeled projections onto all *k*-element position subsets, under what conditions is the original set uniquely reconstructible? It further examines single-string reconstruction and characterizes the *k*-wise independence threshold—the largest *k* for which projections completely erase distinguishing information. Methodologically, the authors introduce a combinatorial modeling framework based on *non-contiguous k-mers*, extending overlap graph algorithms to handle arbitrary (non-adjacent) position projections—thereby relaxing the conventional contiguous-*k*-mer assumption. They define the *k*-wise independence critical point and establish a parameterized complexity model. Theoretically, they prove the problem is NP-hard and inapproximable in general, yet fixed-parameter tractable when either string length or *k* is bounded. Empirical evaluation demonstrates that their algorithm achieves high efficiency and scalability across multi-scale datasets.

Technology Category

Application Category

📝 Abstract
Graphs are a powerful tool for analyzing large data sets, but many real-world phenomena involve interactions that go beyond the simple pairwise relationships captured by a graph. In this paper we introduce and study a simple combinatorial model to capture higher order dependencies from an algorithms and computational complexity perspective. Specifically, we introduce the String Set Reconstruction problem, which asks when a set of strings can be reconstructed from seeing only the k-way projections of strings in the set. This problem is distinguished from genetic reconstruction problems in that we allow projections from any k indices and we maintain knowledge of those indices, but not which k-mer came from which string. We give several results on the complexity of this problem, including hardness results, inapproximability, and parametrized complexity. Our main result is the introduction of a new algorithm for this problem using a modified version of overlap graphs from genetic reconstruction algorithms. A key difference we must overcome is that in our setting the k-mers need not be contiguous, unlike the setting of genetic reconstruction. We exhibit our algorithm's efficiency in a variety of experiments, and give high-level explanations for how its complexity is observed to scale with various parameters. We back up these explanation with analytic approximations. We also consider the related problems of: whether a single string can be reconstructed from the k-way projections of a given set of strings, and finding the largest k at which we get no information about the original data set from its k-way projections (i.e., the largest $k$ for which it is "k-wise independent").
Problem

Research questions and friction points this paper is trying to address.

Reconstructing string sets from k-way projections with known indices
Developing algorithms for higher-order dependencies beyond pairwise relationships
Analyzing complexity of reconstruction when k-mers are non-contiguous
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modified overlap graphs for string reconstruction
Handling non-contiguous k-mers in projections
Complexity analysis with hardness and inapproximability results
🔎 Similar Papers
No similar papers found.