phepy: Visual Benchmarks and Improvements for Out-of-Distribution Detectors

📅 2025-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of ambiguous out-of-distribution (OOD) detection boundaries and non-visualizable evaluation under high-dimensional, sparse, or biased data. To this end, we propose an interpretable and reproducible benchmarking framework and methodology. First, we construct three lightweight, visually interpretable toy benchmarks that systematically evaluate linear/nonlinear discriminability and high-dimensional sparse subspace identification. Second, we introduce *t-poking*—a boundary-refinement method leveraging t-distribution-based feature perturbation, augmented with an OOD-sample weighting mechanism to mitigate ID/OOD boundary ambiguity. Third, we design the first supervised evaluation paradigm explicitly tailored for interpretability validation in OOD detection. Extensive experiments across multiple baselines demonstrate significant improvements in ID-OOD separation accuracy. Our framework provides both a diagnostic evaluation toolkit and practical guidelines for advancing robust and interpretable OOD detection.

Technology Category

Application Category

📝 Abstract
Applying machine learning to increasingly high-dimensional problems with sparse or biased training data increases the risk that a model is used on inputs outside its training domain. For such out-of-distribution (OOD) inputs, the model can no longer make valid predictions, and its error is potentially unbounded. Testing OOD detection methods on real-world datasets is complicated by the ambiguity around which inputs are in-distribution (ID) or OOD. We design a benchmark for OOD detection, which includes three novel and easily-visualisable toy examples. These simple examples provide direct and intuitive insight into whether the detector is able to detect (1) linear and (2) non-linear concepts and (3) identify thin ID subspaces (needles) within high-dimensional spaces (haystacks). We use our benchmark to evaluate the performance of various methods from the literature. Since tactile examples of OOD inputs may benefit OOD detection, we also review several simple methods to synthesise OOD inputs for supervised training. We introduce two improvements, $t$-poking and OOD sample weighting, to make supervised detectors more precise at the ID-OOD boundary. This is especially important when conflicts between real ID and synthetic OOD sample blur the decision boundary. Finally, we provide recommendations for constructing and applying out-of-distribution detectors in machine learning.
Problem

Research questions and friction points this paper is trying to address.

Develops visual benchmarks for OOD detection evaluation
Introduces methods to synthesize OOD inputs for training
Improves precision at ID-OOD boundary with new techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual benchmarks for OOD detection evaluation
Synthesizing OOD inputs for supervised training
Improving ID-OOD boundary precision with t-poking
🔎 Similar Papers
No similar papers found.
J
Juniper Tyree
Institute for Atmospheric and Earth System Research, University of Helsinki, Helsinki, Finland
Andreas Rupp
Andreas Rupp
Saarland University
numerical analysisevolving porous mediadiscontinuous Galerkin
P
Petri Clusius
Institute for Atmospheric and Earth System Research, University of Helsinki, Helsinki, Finland
M
Michael Boy
Institute for Atmospheric and Earth System Research, University of Helsinki, Helsinki, Finland; LUT School of Engineering Sciences, LUT University, Lappeenranta, Finland