🤖 AI Summary
Out-of-distribution (OOD) detection remains challenging in real-world unlabeled data, where in-distribution (InD) and OOD samples are arbitrarily mixed and no explicit OOD supervision is available.
Method: This paper proposes a median-based robust OOD detection framework. Leveraging the strong robustness of the median to outliers, it automatically identifies potential OOD samples from unlabeled data and jointly trains an OOD classifier with labeled InD data. A theoretical upper bound on classification error is derived, guaranteeing low misclassification rates.
Contribution/Results: To our knowledge, this is the first work to incorporate median estimation into OOD outlier identification and to establish a provably robust semi-supervised learning paradigm for OOD detection. Extensive experiments demonstrate that the method significantly outperforms existing state-of-the-art approaches across multiple open-world benchmarks, achieving both theoretical rigor and practical effectiveness.
📝 Abstract
Out-of-distribution (OOD) detection plays a crucial role in ensuring the robustness and reliability of machine learning systems deployed in real-world applications. Recent approaches have explored the use of unlabeled data, showing potential for enhancing OOD detection capabilities. However, effectively utilizing unlabeled in-the-wild data remains challenging due to the mixed nature of both in-distribution (InD) and OOD samples. The lack of a distinct set of OOD samples complicates the task of training an optimal OOD classifier. In this work, we introduce Medix, a novel framework designed to identify potential outliers from unlabeled data using the median operation. We use the median because it provides a stable estimate of the central tendency, as an OOD detection mechanism, due to its robustness against noise and outliers. Using these identified outliers, along with labeled InD data, we train a robust OOD classifier. From a theoretical perspective, we derive error bounds that demonstrate Medix achieves a low error rate. Empirical results further substantiate our claims, as Medix outperforms existing methods across the board in open-world settings, confirming the validity of our theoretical insights.