Triangle Counting in Hypergraph Streams: A Complete and Practical Approach

📅 2025-08-30

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Triangle counting in hypergraph streams faces two key challenges: (i) incomplete structural categorization—existing definitions only distinguish internal and external triangles, omitting critical mixed cases; and (ii) rigid sampling mechanisms—fixed hyperedge-size assumptions hinder adaptability to variable-length hyperedges and stringent memory constraints. Method: We introduce the first complete three-way classification of hypervertex triangles: internal, mixed, and external. We propose HTCount, a dynamic, memory-aware streaming algorithm, and its partitioned variant HTCount-P, integrating reservoir sampling, adaptive sample-size adjustment, and memory-partitioning strategies—all theoretically guaranteed to yield unbiased, low-variance online estimates. Results: Experiments on real-world datasets show HTCount reduces relative estimation error by 1–2 orders of magnitude versus state-of-the-art methods, achieves high throughput, and maintains high accuracy even under strict memory limits—significantly enhancing both structural expressiveness and practical utility.

Technology Category

Application Category

📝 Abstract

Triangle counting in hypergraph streams, including both hyper-vertex and hyper-edge triangles, is a fundamental problem in hypergraph analytics, with broad applications. However, existing methods face two key limitations: (i) an incomplete classification of hyper-vertex triangle structures, typically considering only inner or outer triangles; and (ii) inflexible sampling schemes that predefine the number of sampled hyperedges, which is impractical under strict memory constraints due to highly variable hyperedge sizes. To address these challenges, we first introduce a complete classification of hyper-vertex triangles, including inner, hybrid, and outer triangles. Based on this, we develop HTCount, a reservoir-based algorithm that dynamically adjusts the sample size based on the available memory M. To further improve memory utilization and reduce estimation error, we develop HTCount-P, a partition-based variant that adaptively partitions unused memory into independent sample subsets. We provide theoretical analysis of the unbiasedness and variance bounds of the proposed algorithms. Case studies demonstrate the expressiveness of our triangle structures in revealing meaningful interaction patterns. Extensive experiments on real-world hypergraphs show that both our algorithms achieve highly accurate triangle count estimates under strict memory constraints, with relative errors that are 1 to 2 orders of magnitude lower than those of existing methods and consistently high throughput.

Problem

Research questions and friction points this paper is trying to address.

Incomplete classification of hyper-vertex triangle structures in hypergraphs

Inflexible sampling schemes impractical under strict memory constraints

Need for accurate triangle counting in hypergraph streams with applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Complete classification of hyper-vertex triangles

Reservoir-based algorithm with dynamic sample adjustment

Partition-based variant for improved memory utilization

🔎 Similar Papers

No similar papers found.