🤖 AI Summary
Traditional conformal prediction (CP) treats classes as a flat set, ignoring their semantic hierarchy, resulting in poorly interpretable and practically limited prediction sets. To address this, we propose Hierarchical Conformal Classification (HCC), the first CP framework that explicitly incorporates a tree-structured class hierarchy. HCC employs hierarchical probabilistic modeling and constrained combinatorial optimization to produce nested, multi-level prediction sets while guaranteeing strict finite-sample coverage. We theoretically establish that HCC achieves statistically valid and optimal solutions within a significantly pruned candidate space. Extensive experiments across three new benchmarks—audio, image, and text—demonstrate that HCC substantially improves both predictive accuracy and calibration. Furthermore, user studies confirm that HCC’s hierarchical outputs align more closely with human cognitive preferences, thereby enhancing model trustworthiness and decision-support utility.
📝 Abstract
Conformal prediction (CP) is a powerful framework for quantifying uncertainty in machine learning models, offering reliable predictions with finite-sample coverage guarantees. When applied to classification, CP produces a prediction set of possible labels that is guaranteed to contain the true label with high probability, regardless of the underlying classifier. However, standard CP treats classes as flat and unstructured, ignoring domain knowledge such as semantic relationships or hierarchical structure among class labels. This paper presents hierarchical conformal classification (HCC), an extension of CP that incorporates class hierarchies into both the structure and semantics of prediction sets. We formulate HCC as a constrained optimization problem whose solutions yield prediction sets composed of nodes at different levels of the hierarchy, while maintaining coverage guarantees. To address the combinatorial nature of the problem, we formally show that a much smaller, well-structured subset of candidate solutions suffices to ensure coverage while upholding optimality. An empirical evaluation on three new benchmarks consisting of audio, image, and text data highlights the advantages of our approach, and a user study shows that annotators significantly prefer hierarchical over flat prediction sets.