Locally Pareto-Optimal Interpretations for Black-Box Machine Learning Models

📅 2025-08-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Pareto-optimal explanation methods for black-box model interpretability either lack formal guarantees or suffer from severe scalability bottlenecks, hindering the simultaneous achievement of accuracy and interpretability. Method: We propose a multi-objective explanation synthesis framework grounded in local optimality verification. It formulates local Pareto-optimality certification as a Boolean satisfiability (SAT) problem and integrates multi-objective Monte Carlo tree search for efficient, bounded-depth optimization—obviating exhaustive global search. Contribution/Results: Our approach provides rigorous formal guarantees on local Pareto optimality while achieving explanation quality comparable to global Pareto-optimal methods. Empirical evaluation across multiple benchmark datasets demonstrates both high explanation fidelity and superior scalability. By enabling verifiable, computationally tractable local explanations, the framework establishes a new paradigm for trustworthy AI that bridges theoretical soundness with practical deployability.

Technology Category

Application Category

📝 Abstract
Creating meaningful interpretations for black-box machine learning models involves balancing two often conflicting objectives: accuracy and explainability. Exploring the trade-off between these objectives is essential for developing trustworthy interpretations. While many techniques for multi-objective interpretation synthesis have been developed, they typically lack formal guarantees on the Pareto-optimality of the results. Methods that do provide such guarantees, on the other hand, often face severe scalability limitations when exploring the Pareto-optimal space. To address this, we develop a framework based on local optimality guarantees that enables more scalable synthesis of interpretations. Specifically, we consider the problem of synthesizing a set of Pareto-optimal interpretations with local optimality guarantees, within the immediate neighborhood of each solution. Our approach begins with a multi-objective learning or search technique, such as Multi-Objective Monte Carlo Tree Search, to generate a best-effort set of Pareto-optimal candidates with respect to accuracy and explainability. We then verify local optimality for each candidate as a Boolean satisfiability problem, which we solve using a SAT solver. We demonstrate the efficacy of our approach on a set of benchmarks, comparing it against previous methods for exploring the Pareto-optimal front of interpretations. In particular, we show that our approach yields interpretations that closely match those synthesized by methods offering global guarantees.
Problem

Research questions and friction points this paper is trying to address.

Balancing accuracy and explainability in black-box model interpretations
Synthesizing Pareto-optimal interpretations with local optimality guarantees
Addressing scalability limitations in Pareto-optimal interpretation synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Local Pareto-optimality guarantees for interpretations
Multi-Objective Monte Carlo Tree Search generation
SAT solver verification of Boolean satisfiability
🔎 Similar Papers
No similar papers found.