🤖 AI Summary
This work addresses the challenges of biased learning and unreliable predictions in multilingual emotion recognition caused by missing annotations and emotional ambiguity. To this end, the authors propose an uncertainty-aware multi-label classification framework that explicitly models label uncertainty to prevent misinterpreting missing labels as negative samples. The approach integrates a shared multilingual encoder, language-specific optimization, an entropy-driven ambiguity weighting mechanism, and a novel objective function combining positive-unlabeled learning regularization with mask-aware loss. Experiments on English, Spanish, and Arabic benchmarks demonstrate that the method significantly outperforms strong baselines, offering improved training stability, enhanced robustness to sparse annotations, and greater prediction interpretability.
📝 Abstract
Contemporary knowledge-based systems increasingly rely on multilingual emotion identification to support intelligent decision-making, yet they face major challenges due to emotional ambiguity and incomplete supervision. Emotion recognition from text is inherently uncertain because multiple emotional states often co-occur and emotion annotations are frequently missing or heterogeneous. Most existing multi-label emotion classification methods assume fully observed labels and rely on deterministic learning objectives, which can lead to biased learning and unreliable predictions under partial supervision. This paper introduces Reasoning under Ambiguity, an uncertainty-aware framework for multilingual multi-label emotion classification that explicitly aligns learning with annotation uncertainty. The proposed approach uses a shared multilingual encoder with language-specific optimization and an entropy-based ambiguity weighting mechanism that down-weights highly ambiguous training instances rather than treating missing labels as negative evidence. A mask-aware objective with positive-unlabeled regularization is further incorporated to enable robust learning under partial supervision. Experiments on English, Spanish, and Arabic emotion classification benchmarks demonstrate consistent improvements over strong baselines across multiple evaluation metrics, along with improved training stability, robustness to annotation sparsity, and enhanced interpretability.