🤖 AI Summary
This work addresses the limitations of the traditional Panoptic Quality (PQ) metric, which lacks a well-defined instance matching mechanism when IoU thresholds fall below 0.5, rendering it vulnerable to challenges such as fragmentation, ambiguous boundaries, and annotation noise. The authors formulate instance matching as a constrained bipartite graph assignment problem, decoupling match confidence from both prediction and ground truth sides. They systematically define four distinct matching strategies and introduce, for the first time, a vertex-centric framework that unifies the computation of true positives, false negatives, and false positives. This approach comprehensively characterizes the space of matching strategies under low-IoU conditions and naturally extends to part-aware panoptic segmentation evaluation—particularly beneficial for biomedical image analysis. The authors further release Panoptica, an open-source evaluation toolkit supporting multi-strategy and part-level assessment, demonstrating its efficacy across multiple case studies.
📝 Abstract
The Panoptic Quality (PQ) metric is the standard for jointly evaluating instance and semantic segmentation. However, its original definition relies on a One-to-One matching between predicted and ground truth segments, which is only straightforward when the IoU threshold exceeds 0.5. Below 0.5, multiple matching strategies emerge in a poorly explored problem space. We systematically elucidate this space by recasting segment matching as a constrained bipartite assignment problem. Independently bounding the prediction- and ground-truth-side degrees yields four matching strategies: One-to-One, Many-to-One, One-to-Many, and Many-to-Many. We show that the first three are well-defined within the PQ framework, while Many-to-Many falls outside it. These strategies become relevant when instances are fragmented, adjacent objects are difficult to delineate, or annotations are noisy. Central to our framework is a vertex-based accounting of TP, FN, and FP, anchored to ground truth and predicted segments rather than to matching edges. We further show that the framework extends naturally to part-aware panoptic segmentation, and we explore part-aware evaluation on biomedical data. Across configurable case studies we report how different combinations of thresholds and matching strategies behave in practice. We release a unified open-source package built on Panoptica. It exposes Voronoi-based region-wise analysis, part-aware evaluation, and Area Under Threshold Curve computations as configurable options.