Locally Pareto-Optimal Interpretations for Black-Box Machine Learning Models

📅 2025-08-21

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Existing Pareto-optimal explanation methods for black-box model interpretability either lack formal guarantees or suffer from severe scalability bottlenecks, hindering the simultaneous achievement of accuracy and interpretability. Method: We propose a multi-objective explanation synthesis framework grounded in local optimality verification. It formulates local Pareto-optimality certification as a Boolean satisfiability (SAT) problem and integrates multi-objective Monte Carlo tree search for efficient, bounded-depth optimization—obviating exhaustive global search. Contribution/Results: Our approach provides rigorous formal guarantees on local Pareto optimality while achieving explanation quality comparable to global Pareto-optimal methods. Empirical evaluation across multiple benchmark datasets demonstrates both high explanation fidelity and superior scalability. By enabling verifiable, computationally tractable local explanations, the framework establishes a new paradigm for trustworthy AI that bridges theoretical soundness with practical deployability.

Technology Category

Application Category

📝 Abstract

Creating meaningful interpretations for black-box machine learning models involves balancing two often conflicting objectives: accuracy and explainability. Exploring the trade-off between these objectives is essential for developing trustworthy interpretations. While many techniques for multi-objective interpretation synthesis have been developed, they typically lack formal guarantees on the Pareto-optimality of the results. Methods that do provide such guarantees, on the other hand, often face severe scalability limitations when exploring the Pareto-optimal space. To address this, we develop a framework based on local optimality guarantees that enables more scalable synthesis of interpretations. Specifically, we consider the problem of synthesizing a set of Pareto-optimal interpretations with local optimality guarantees, within the immediate neighborhood of each solution. Our approach begins with a multi-objective learning or search technique, such as Multi-Objective Monte Carlo Tree Search, to generate a best-effort set of Pareto-optimal candidates with respect to accuracy and explainability. We then verify local optimality for each candidate as a Boolean satisfiability problem, which we solve using a SAT solver. We demonstrate the efficacy of our approach on a set of benchmarks, comparing it against previous methods for exploring the Pareto-optimal front of interpretations. In particular, we show that our approach yields interpretations that closely match those synthesized by methods offering global guarantees.

Problem

Research questions and friction points this paper is trying to address.

Balancing accuracy and explainability in black-box model interpretations

Synthesizing Pareto-optimal interpretations with local optimality guarantees

Addressing scalability limitations in Pareto-optimal interpretation synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Local Pareto-optimality guarantees for interpretations

Multi-Objective Monte Carlo Tree Search generation

SAT solver verification of Boolean satisfiability

🔎 Similar Papers

No similar papers found.