Enhancing Mental Health Classification with Layer-Attentive Residuals and Contrastive Feature Learning

📅 2026-03-14

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This work addresses key challenges in mental health text classification—namely, semantic overlap among categories, strong contextual dependencies, and entangled feature spaces. To tackle these issues, the authors propose a layered attention residual aggregation mechanism that dynamically fuses multi-layer representations from a Transformer to preserve high-level semantics. Additionally, they introduce temperature-scaled supervised contrastive learning to reshape the geometric structure of the feature space, thereby enhancing discriminability among easily confusable classes. Combined with a progressive weighting strategy, the model achieves 74.36% accuracy on the SWMH benchmark, significantly outperforming MentalBERT and MentalRoBERTa by 2.2%–3.25% in accuracy and by 2.41 recall points, all without requiring domain-specific pretraining.

Technology Category

Application Category

📝 Abstract

The classification of mental health is challenging for a variety of reasons. For one, there is overlap between the mental health issues. In addition, the signs of mental health issues depend on the context of the situation, making classification difficult. Although fine-tuning transformers has improved the performance for mental health classification, standard cross-entropy training tends to create entangled feature spaces and fails to utilize all the information the transformers contain. We present a new framework that focuses on representations to improve mental health classification. This is done using two methods. First, \textbf{layer-attentive residual aggregation} which works on residual connections to to weigh and fuse representations from all transformer layers while maintaining high-level semantics. Second, \textbf{supervised contrastive feature learning} uses temperature-scaled supervised contrastive learning with progressive weighting to increase the geometric margin between confusable mental health problems and decrease class overlap by restructuring the feature space. With a score of \textbf{74.36\%}, the proposed method is the best performing on the SWMH benchmark and outperforms models that are domain-specialized, such as \textit{MentalBERT} and \textit{MentalRoBERTa} by margins of (3.25\% - 2.2\%) and 2.41 recall points over the highest achieving model. These findings show that domain-adaptive pretraining for mental health text classification can be surpassed by carefully designed representation geometry and layer-aware residual integration, which also provide enhanced interpretability through learnt layer importance.

Problem

Research questions and friction points this paper is trying to address.

mental health classification

class overlap

context-dependent symptoms

feature entanglement

confusable mental disorders

Innovation

Methods, ideas, or system contributions that make the work stand out.

layer-attentive residuals

contrastive feature learning

mental health classification