CMHL: Contrastive Multi-Head Learning for Emotionally Consistent Text Classification

📅 2026-03-14

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work addresses the issues of logical inconsistency in text sentiment classification and overreliance on large language models or complex ensembles by proposing a lightweight single-model architecture with only 125M parameters. The approach integrates psychological priors with multi-task learning to jointly predict primary emotion, valence, and intensity. Grounded in Russell’s circumplex model of affect, the method employs a multi-head joint prediction mechanism and introduces a novel contrastive contradiction loss function to enforce consistency across affective dimensions. Experimental results demonstrate that the model achieves an F1 score of 93.75% on the Emotion dataset—surpassing a large language model 56 times its size—and attains an F1 of 72.50% and recall of 73.30% on the SWMH dataset, outperforming specialized models such as MentalBERT.

Technology Category

Application Category

📝 Abstract

Textual Emotion Classification (TEC) is one of the most difficult NLP tasks. State of the art approaches rely on Large language models (LLMs) and multi-model ensembles. In this study, we challenge the assumption that larger scale or more complex models are necessary for improved performance. In order to improve logical consistency, We introduce CMHL, a novel single-model architecture that explicitly models the logical structure of emotions through three key innovations: (1) multi-task learning that jointly predicts primary emotions, valence, and intensity, (2) psychologically-grounded auxiliary supervision derived from Russell's circumplex model, and (3) a novel contrastive contradiction loss that enforces emotional consistency by penalizing mutually incompatible predictions (e.g., simultaneous high confidence in joy and anger). With just 125M parameters, our model outperforms 56x larger LLMs and sLM ensembles with a new state-of-the-art F1 score of 93.75\% compared to (86.13\%-93.2\%) on the dair-ai Emotion dataset. We further show cross domain generalization on the Reddit Suicide Watch and Mental Health Collection dataset (SWMH), outperforming domain-specific models like MentalBERT and MentalRoBERTa with an F1 score of 72.50\% compared to (68.16\%-72.16\%) + a 73.30\% recall compared to (67.05\%-70.89\%) that translates to enhanced sensitivity for detecting mental health distress. Our work establishes that architectural intelligence (not parameter count) drives progress in TEC. By embedding psychological priors and explicit consistency constraints, a well-designed single model can outperform both massive LLMs and complex ensembles, offering a efficient, interpretable, and clinically-relevant paradigm for affective computing.

Problem

Research questions and friction points this paper is trying to address.

Textual Emotion Classification

Emotional Consistency

Logical Consistency

Affective Computing

Psychological Priors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive Multi-Head Learning

Emotional Consistency

Multi-task Learning