When Sufficient is not Enough: Utilizing the Rashomon Effect for Complete Evidence Extraction

📅 2025-11-10

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Conventional feature attribution methods yield only minimal sufficient evidence, failing to meet regulatory and interpretability requirements in high-stakes domains such as healthcare, where identification of *all* relevant features—i.e., complete evidence—is essential. Method: We propose a multi-model attribution ensemble framework grounded in the Rashomon effect, which dynamically fuses attributions from heterogeneous models via an adaptive thresholding mechanism and incorporates explicit evidence supervision during training to enhance modeling of attribution completeness. Contribution/Results: Evaluated on a medical dataset with human-annotated complete evidence, our method increases complete evidence recall from 0.60 (single-model baseline) to 0.86—a statistically significant improvement. This work provides the first systematic empirical validation that multi-model collaboration effectively mitigates attribution incompleteness. It establishes a verifiable, explainability-aware pathway for high-assurance AI systems, advancing both theoretical understanding and practical deployment of trustworthy machine learning.

Technology Category

Application Category

📝 Abstract

Feature attribution methods typically provide minimal sufficient evidence justifying a model decision. However, in many applications this is inadequate. For compliance and cataloging, the full set of contributing features must be identified - complete evidence. We perform a case study on a medical dataset which contains human-annotated complete evidence. We show that individual models typically recover only subsets of complete evidence and that aggregating evidence from several models improves evidence recall from $sim$0.60 (single best model) to $sim$0.86 (ensemble). We analyze the recall-precision trade-off, the role of training with evidence, dynamic ensembles with certainty thresholds, and discuss implications.

Problem

Research questions and friction points this paper is trying to address.

Identifying full feature sets for compliance and cataloging requirements

Improving evidence recall from individual models through ensemble methods

Analyzing recall-precision trade-offs in complete evidence extraction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aggregating evidence from multiple models

Using ensembles to improve evidence recall

Applying dynamic ensembles with certainty thresholds

🔎 Similar Papers

PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding