Scaling Probabilistic Circuits via Monarch Matrices

📅 2025-06-14

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

Existing probabilistic circuits (PCs) suffer from high memory and computational overhead, as well as poor scalability, due to their reliance on dense sum layers. This work introduces sparse, structured Monarch matrices into the sum modules of PCs for the first time—jointly exploiting the inherent sparsity of PCs and hardware-efficient tensorized operations. Our differentiable sparse parameterization is derived directly from the natural multiplicative structure of PC inference, integrating sparse tensor computation, architectural innovation in PC design, and differentiable structural learning. Evaluated on benchmarks including Text8, LM1B, and ImageNet, our approach achieves state-of-the-art performance in generative modeling. Crucially, it reduces training FLOPs significantly at equivalent performance levels and markedly improves model scalability.

Technology Category

Application Category

📝 Abstract

Probabilistic Circuits (PCs) are tractable representations of probability distributions allowing for exact and efficient computation of likelihoods and marginals. Recent advancements have improved the scalability of PCs either by leveraging their sparse properties or through the use of tensorized operations for better hardware utilization. However, no existing method fully exploits both aspects simultaneously. In this paper, we propose a novel sparse and structured parameterization for the sum blocks in PCs. By replacing dense matrices with sparse Monarch matrices, we significantly reduce the memory and computation costs, enabling unprecedented scaling of PCs. From a theory perspective, our construction arises naturally from circuit multiplication; from a practical perspective, compared to previous efforts on scaling up tractable probabilistic models, our approach not only achieves state-of-the-art generative modeling performance on challenging benchmarks like Text8, LM1B and ImageNet, but also demonstrates superior scaling behavior, achieving the same performance with substantially less compute as measured by the number of floating-point operations (FLOPs) during training.

Problem

Research questions and friction points this paper is trying to address.

Enhancing scalability of Probabilistic Circuits via sparse Monarch matrices

Reducing memory and computation costs in Probabilistic Circuits

Achieving state-of-the-art performance in generative modeling benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Monarch matrices replace dense ones

Combines sparsity and tensorized operations effectively

Reduces memory and computation costs significantly

🔎 Similar Papers

No similar papers found.

Authors to Follow