A Median Perspective on Unlabeled Data for Out-of-Distribution Detection

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Out-of-distribution (OOD) detection remains challenging in real-world unlabeled data, where in-distribution (InD) and OOD samples are arbitrarily mixed and no explicit OOD supervision is available. Method: This paper proposes a median-based robust OOD detection framework. Leveraging the strong robustness of the median to outliers, it automatically identifies potential OOD samples from unlabeled data and jointly trains an OOD classifier with labeled InD data. A theoretical upper bound on classification error is derived, guaranteeing low misclassification rates. Contribution/Results: To our knowledge, this is the first work to incorporate median estimation into OOD outlier identification and to establish a provably robust semi-supervised learning paradigm for OOD detection. Extensive experiments demonstrate that the method significantly outperforms existing state-of-the-art approaches across multiple open-world benchmarks, achieving both theoretical rigor and practical effectiveness.

Technology Category

Application Category

📝 Abstract

Out-of-distribution (OOD) detection plays a crucial role in ensuring the robustness and reliability of machine learning systems deployed in real-world applications. Recent approaches have explored the use of unlabeled data, showing potential for enhancing OOD detection capabilities. However, effectively utilizing unlabeled in-the-wild data remains challenging due to the mixed nature of both in-distribution (InD) and OOD samples. The lack of a distinct set of OOD samples complicates the task of training an optimal OOD classifier. In this work, we introduce Medix, a novel framework designed to identify potential outliers from unlabeled data using the median operation. We use the median because it provides a stable estimate of the central tendency, as an OOD detection mechanism, due to its robustness against noise and outliers. Using these identified outliers, along with labeled InD data, we train a robust OOD classifier. From a theoretical perspective, we derive error bounds that demonstrate Medix achieves a low error rate. Empirical results further substantiate our claims, as Medix outperforms existing methods across the board in open-world settings, confirming the validity of our theoretical insights.

Problem

Research questions and friction points this paper is trying to address.

Detecting outliers from mixed unlabeled data containing both InD and OOD samples

Training robust OOD classifiers without distinct OOD training samples

Improving OOD detection performance in open-world real-world applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Medix uses median operation for outlier detection

Framework trains OOD classifier with identified outliers

Method achieves low error rate with theoretical guarantees

🔎 Similar Papers

Continual Unsupervised Out-of-Distribution Detection