🤖 AI Summary
This work addresses the challenge of structural ambiguity in partially observable settings, where existing selective prediction methods often fail to resolve local temporal evidence conflicts, leading to distorted confidence estimates and unreliable uncertainty identification. To overcome this limitation, the authors propose CHASE, a novel framework that introduces a competitive hypothesis mechanism. CHASE explicitly compares scores across multiple structured temporal hypotheses and constructs a margin-aware, ranking-based selector to distinguish reliable predictions from inherently uncertain cases. The method leverages large-scale dynamic data of giant unilamellar vesicles (GUVs) generated via physical simulation and demonstrates zero-shot transfer to real GUV videos without fine-tuning. Experiments show that at 80% coverage, CHASE improves fuzzy alignment by 11.0% and boosts three-class accuracy in high-ambiguity regions by 8.8%; at 90% coverage, it reduces overall risk by 9.9%.
📝 Abstract
Standard selective prediction methods typically estimate uncertainty from the output of a single predictive branch. While effective for general uncertainty estimation, these approaches often struggle under partial observability, where local temporal evidence can be contradictory and standard confidence scores become misleading. We introduce CHASE (Competing Hypotheses for Ambiguity-Aware Selective Prediction), a selective prediction framework that explicitly compares structured temporal explanations to determine whether to commit to a decision or abstain. Because genuine ambiguity causes the score gap between competing hypotheses to collapse, CHASE optimizes a ranking-aware selector over these hypothesis margins to globally separate safe commitments from fundamentally uncertain ones. We evaluate this framework on the problem of hidden connectivity inference, utilizing a controlled, physically grounded simulator inspired by the dynamics of giant unilamellar vesicles (GUVs), alongside zero-shot qualitative transfer (without retraining or fine tuning) to representative real GUV videos. Our experiments demonstrate that explicitly reasoning over competing hypotheses provides a superior balance of metrics. Compared to canonical uncertainty baselines, CHASE achieves statistically significant gains in overall no-abstain accuracy, three-way accuracy, and overall ambiguity-aligned abstention (at 80% coverage). Specifically, it yields up to an 11.0% relative mean improvement in overall alignment, alongside up to an 8.8% relative boost in three-way accuracy in the very-high ambiguity regime. By maintaining a selective risk boundary strictly at par with the best baselines at 80% coverage, and reducing overall risk by 9.9% at 90% coverage, this framework offers a more reliable approach to decision-making under structured ambiguity.