When Sufficient is not Enough: Utilizing the Rashomon Effect for Complete Evidence Extraction

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional feature attribution methods yield only minimal sufficient evidence, failing to meet regulatory and interpretability requirements in high-stakes domains such as healthcare, where identification of *all* relevant features—i.e., complete evidence—is essential. Method: We propose a multi-model attribution ensemble framework grounded in the Rashomon effect, which dynamically fuses attributions from heterogeneous models via an adaptive thresholding mechanism and incorporates explicit evidence supervision during training to enhance modeling of attribution completeness. Contribution/Results: Evaluated on a medical dataset with human-annotated complete evidence, our method increases complete evidence recall from 0.60 (single-model baseline) to 0.86—a statistically significant improvement. This work provides the first systematic empirical validation that multi-model collaboration effectively mitigates attribution incompleteness. It establishes a verifiable, explainability-aware pathway for high-assurance AI systems, advancing both theoretical understanding and practical deployment of trustworthy machine learning.

Technology Category

Application Category

📝 Abstract
Feature attribution methods typically provide minimal sufficient evidence justifying a model decision. However, in many applications this is inadequate. For compliance and cataloging, the full set of contributing features must be identified - complete evidence. We perform a case study on a medical dataset which contains human-annotated complete evidence. We show that individual models typically recover only subsets of complete evidence and that aggregating evidence from several models improves evidence recall from $sim$0.60 (single best model) to $sim$0.86 (ensemble). We analyze the recall-precision trade-off, the role of training with evidence, dynamic ensembles with certainty thresholds, and discuss implications.
Problem

Research questions and friction points this paper is trying to address.

Identifying full feature sets for compliance and cataloging requirements
Improving evidence recall from individual models through ensemble methods
Analyzing recall-precision trade-offs in complete evidence extraction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Aggregating evidence from multiple models
Using ensembles to improve evidence recall
Applying dynamic ensembles with certainty thresholds
🔎 Similar Papers
No similar papers found.