🤖 AI Summary
Extreme Multi-Label Classification (XMLC) faces core challenges including massive label spaces, severe long-tail label distributions, and poor scalability of conventional methods—particularly the difficulty of learning effective embeddings for tail-labels. This paper presents a systematic survey of XMLC research. First, it introduces a novel evaluation framework explicitly differentiated for head versus tail labels—an innovation not previously proposed. Second, it unifies diverse embedding paradigms—including compressed sensing, tree-based indexing, attention mechanisms, singular value decomposition (SVD), and hashing—spanning low-dimensional projections, deep learning, linear algebra, clustering, and tree models. Third, it proposes a principled taxonomy of XMLC methods, clarifying applicability boundaries and inherent trade-offs in accuracy, efficiency, and tail-label coverage. The survey provides a theoretical foundation for algorithm selection and establishes a basis for designing new approaches to mitigate long-tail bias and enhance predictive performance on tail labels.
📝 Abstract
Extreme multi-label classification or XMLC, is an active area of interest in machine learning. Compared to traditional multi-label classification, here the number of labels is extremely large, hence, the name extreme multi-label classification. Using classical one-versus-all classification does not scale in this case due to large number of labels; the same is true for any other classifier. Embedding labels and features into a lower-dimensional space is a common first step in many XMLC methods. Moreover, other issues include existence of head and tail labels, where tail labels are those that occur in a relatively small number of samples. The existence of tail labels creates issues during embedding. This area has invited application of wide range of approaches ranging from bit compression motivated from compressed sensing, tree based embeddings, deep learning based latent space embedding including using attention weights, linear algebra based embeddings such as SVD, clustering, hashing, to name a few. The community has come up with a useful set of metrics to identify correctly the prediction for head or tail labels.