Triangle Counting in Hypergraph Streams: A Complete and Practical Approach

📅 2025-08-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Triangle counting in hypergraph streams faces two key challenges: (i) incomplete structural categorization—existing definitions only distinguish internal and external triangles, omitting critical mixed cases; and (ii) rigid sampling mechanisms—fixed hyperedge-size assumptions hinder adaptability to variable-length hyperedges and stringent memory constraints. Method: We introduce the first complete three-way classification of hypervertex triangles: internal, mixed, and external. We propose HTCount, a dynamic, memory-aware streaming algorithm, and its partitioned variant HTCount-P, integrating reservoir sampling, adaptive sample-size adjustment, and memory-partitioning strategies—all theoretically guaranteed to yield unbiased, low-variance online estimates. Results: Experiments on real-world datasets show HTCount reduces relative estimation error by 1–2 orders of magnitude versus state-of-the-art methods, achieves high throughput, and maintains high accuracy even under strict memory limits—significantly enhancing both structural expressiveness and practical utility.

Technology Category

Application Category

📝 Abstract
Triangle counting in hypergraph streams, including both hyper-vertex and hyper-edge triangles, is a fundamental problem in hypergraph analytics, with broad applications. However, existing methods face two key limitations: (i) an incomplete classification of hyper-vertex triangle structures, typically considering only inner or outer triangles; and (ii) inflexible sampling schemes that predefine the number of sampled hyperedges, which is impractical under strict memory constraints due to highly variable hyperedge sizes. To address these challenges, we first introduce a complete classification of hyper-vertex triangles, including inner, hybrid, and outer triangles. Based on this, we develop HTCount, a reservoir-based algorithm that dynamically adjusts the sample size based on the available memory M. To further improve memory utilization and reduce estimation error, we develop HTCount-P, a partition-based variant that adaptively partitions unused memory into independent sample subsets. We provide theoretical analysis of the unbiasedness and variance bounds of the proposed algorithms. Case studies demonstrate the expressiveness of our triangle structures in revealing meaningful interaction patterns. Extensive experiments on real-world hypergraphs show that both our algorithms achieve highly accurate triangle count estimates under strict memory constraints, with relative errors that are 1 to 2 orders of magnitude lower than those of existing methods and consistently high throughput.
Problem

Research questions and friction points this paper is trying to address.

Incomplete classification of hyper-vertex triangle structures in hypergraphs
Inflexible sampling schemes impractical under strict memory constraints
Need for accurate triangle counting in hypergraph streams with applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Complete classification of hyper-vertex triangles
Reservoir-based algorithm with dynamic sample adjustment
Partition-based variant for improved memory utilization
🔎 Similar Papers
No similar papers found.
L
Lingkai Meng
Shanghai Jiao Tong University, China
Long Yuan
Long Yuan
Wuhan University of Technology
DatabasesGraph MiningData Mining
X
Xuemin Lin
Shanghai Jiao Tong University, China
W
Wenjie Zhang
University of New South Wales, Australia
Y
Ying Zhang
Zhejiang Gongshang University, China