🤖 AI Summary
Triangle counting in hypergraph streams faces two key challenges: (i) incomplete structural categorization—existing definitions only distinguish internal and external triangles, omitting critical mixed cases; and (ii) rigid sampling mechanisms—fixed hyperedge-size assumptions hinder adaptability to variable-length hyperedges and stringent memory constraints.
Method: We introduce the first complete three-way classification of hypervertex triangles: internal, mixed, and external. We propose HTCount, a dynamic, memory-aware streaming algorithm, and its partitioned variant HTCount-P, integrating reservoir sampling, adaptive sample-size adjustment, and memory-partitioning strategies—all theoretically guaranteed to yield unbiased, low-variance online estimates.
Results: Experiments on real-world datasets show HTCount reduces relative estimation error by 1–2 orders of magnitude versus state-of-the-art methods, achieves high throughput, and maintains high accuracy even under strict memory limits—significantly enhancing both structural expressiveness and practical utility.
📝 Abstract
Triangle counting in hypergraph streams, including both hyper-vertex and hyper-edge triangles, is a fundamental problem in hypergraph analytics, with broad applications. However, existing methods face two key limitations: (i) an incomplete classification of hyper-vertex triangle structures, typically considering only inner or outer triangles; and (ii) inflexible sampling schemes that predefine the number of sampled hyperedges, which is impractical under strict memory constraints due to highly variable hyperedge sizes. To address these challenges, we first introduce a complete classification of hyper-vertex triangles, including inner, hybrid, and outer triangles. Based on this, we develop HTCount, a reservoir-based algorithm that dynamically adjusts the sample size based on the available memory M. To further improve memory utilization and reduce estimation error, we develop HTCount-P, a partition-based variant that adaptively partitions unused memory into independent sample subsets. We provide theoretical analysis of the unbiasedness and variance bounds of the proposed algorithms. Case studies demonstrate the expressiveness of our triangle structures in revealing meaningful interaction patterns. Extensive experiments on real-world hypergraphs show that both our algorithms achieve highly accurate triangle count estimates under strict memory constraints, with relative errors that are 1 to 2 orders of magnitude lower than those of existing methods and consistently high throughput.