🤖 AI Summary
Existing graph distances—such as the structural Hamming distance—are defined purely on graph topology, ignoring the statistical or causal semantics encoded by the graphs, resulting in measures lacking statistical consistency and interpretability. This paper introduces the first model-oriented graph distance framework: treating graphs as probabilistic or causal models, it constructs a partial order based on model inclusion, then defines neighborhoods and shortest-path distances within this order. The framework unifies treatment across diverse graph classes—including DAGs and Markov random fields—while ensuring theoretical rigor and computational feasibility. We develop efficient algorithms for distance computation and confidence-bound estimation, and empirically validate statistical validity and practical utility on probabilistic and causal graphical models. Our core contribution is the paradigm shift from topology-based to semantics-based graph distances—anchoring distance definitions in model interpretation rather than structural features—thereby establishing a statistically meaningful foundation for parameter-space metrics, hypothesis testing, and confidence set construction.
📝 Abstract
A well-defined distance on the parameter space is key to evaluating estimators, ensuring consistency, and building confidence sets. While there are typically standard distances to adopt in a continuous space, this is not the case for combinatorial parameters such as graphs that represent statistical models. Existing proposals like the structural Hamming distance are defined on the graphs rather than the models they represent and can hence lead to undesirable behaviors. We propose a model-oriented framework for defining the distance between graphs that is applicable across many different graph classes. Our approach treats each graph as a statistical model and organizes the graphs in a partially ordered set based on model inclusion. This induces a neighborhood structure, from which we define the model-oriented distance as the length of a shortest path through neighbors, yielding a metric in the space of graphs. We apply this framework to both probabilistic graphical models (e.g., undirected graphs and completed partially directed acyclic graphs) and causal graphical models (e.g., directed acyclic graphs and maximally oriented partially directed acyclic graphs). We analyze the theoretical and empirical behaviors of model-oriented distances. Algorithmic tools are also developed for computing and bounding these distances.