🤖 AI Summary
This work addresses the problem of exact inference of output distributions for probabilistic programs containing loops. To this end, it introduces the novel concept of *occupancy invariants*, which are dual to traditional martingale-based approaches and naturally incorporate the initial distribution while simultaneously establishing positive almost-sure termination. The method combines occupancy measures with generating functions and employs a template-based approach to automatically synthesize probabilistic loop invariants, enabling mathematically precise reasoning about output distributions. Experimental evaluation demonstrates that the proposed automated synthesis framework effectively supports exact distribution inference for a range of benchmark probabilistic programs with loops.
📝 Abstract
A fundamental computational task in probabilistic programming is to infer a program's output (posterior) distribution from a given initial (prior) distribution. This problem is challenging, especially for expressive languages that feature loops or unbounded recursion. While most of the existing literature focuses on statistical approximation, in this paper we address the problem of mathematically exact inference. To achieve this for programs with loops, we rely on a relatively underexplored type of probabilistic loop invariant, which is linked to a loop's so-called occupation measure. The occupation measure associates program states with their expected number of visits, given the initial distribution. Based on this, we derive the notion of an occupation invariant. Such invariants are essentially dual to probabilistic martingales, the predominant technique for formal probabilistic loop analysis in the literature. A key feature of occupation invariants is that they can take the initial distribution into account and often yield a proof of positive almost sure termination as a by-product. Finally, we present an automatic, template-based invariant synthesis approach for occupation invariants by encoding them as generating functions. The approach is implemented and evaluated on a set of benchmarks.