Accelerating Triangle Counting with Real Processing-in-Memory Systems

📅 2025-05-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional CPU-based triangle counting (TC) suffers from memory bandwidth bottlenecks and low data reuse, limiting scalability. This work presents the first efficient TC implementation on a commercial Processing-in-Memory (PIM) platform—UPMEM. We propose a PIM-aware co-optimization framework combining *vertex coloring* and *multi-level sampling*: coloring minimizes inter-core communication, while integrated reservoir sampling, Misra-Gries frequency summaries, and edge uniform sampling jointly balance accuracy and throughput. The system supports both exact and approximate TC modes, as well as dynamic graph processing. Experiments demonstrate speedups of several-fold to over an order of magnitude over state-of-the-art CPU implementations across diverse graph scales, significantly alleviating memory bandwidth pressure. To our knowledge, this is the first TC system fully adapted to real-world PIM hardware, establishing a new paradigm for memory-bound graph analytics.

Technology Category

Application Category

📝 Abstract
Triangle Counting (TC) is a procedure that involves enumerating the number of triangles within a graph. It has important applications in numerous fields, such as social or biological network analysis and network security. TC is a memory-bound workload that does not scale efficiently in conventional processor-centric systems due to several memory accesses across large memory regions and low data reuse. However, recent Processing-in-Memory (PIM) architectures present a promising solution to alleviate these bottlenecks. Our work presents the first TC algorithm that leverages the capabilities of the UPMEM system, the first commercially available PIM architecture, while at the same time addressing its limitations. We use a vertex coloring technique to avoid expensive communication between PIM cores and employ reservoir sampling to address the limited amount of memory available in the PIM cores' DRAM banks. In addition, our work makes use of the Misra-Gries summary to speed up counting triangles on graphs with high-degree nodes and uniform sampling of the graph edges for quicker approximate results. Our PIM implementation surpasses state-of-the-art CPU-based TC implementations when processing dynamic graphs in Coordinate List format, showcasing the effectiveness of the UPMEM architecture in addressing TC's memory-bound challenges.
Problem

Research questions and friction points this paper is trying to address.

Triangle Counting scales poorly in conventional systems due to memory bottlenecks
UPMEM PIM architecture addresses TC's memory-bound challenges effectively
Novel algorithm combines vertex coloring, reservoir sampling, and Misra-Gries for efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages UPMEM PIM for triangle counting
Uses vertex coloring to reduce communication
Applies reservoir sampling for memory limits
🔎 Similar Papers
No similar papers found.