🤖 AI Summary
This work addresses the challenge of detecting rare psychological defense mechanisms—such as the “Unclear” category—in dialogue texts, where extreme class imbalance severely hinders model performance. To tackle this issue, the authors propose an iterative, imbalance-aware fine-tuning framework that efficiently adapts the Qwen3-8B large language model via QLoRA. The approach integrates grouped stratified cross-validation, minority-class-focused lexical augmentation through round-robin sampling, logit bias tuning, and model ensembling. Evaluated on the PsyDefDetect 2026 benchmark, the system achieves a macro F1 score of 0.3917, ranking 4th among 21 participating teams and outperforming the baseline by 7.7 absolute points. Notably, the F1 score for the “Unclear” class reaches 0.797, demonstrating substantially improved recall for rare classes and a reduced gap between validation and test performance.
📝 Abstract
Detecting psychological defense mechanisms in conversational text remains a challenging clinical NLP problem. For the PsyDefDetect 2026 shared task (nine-class utterance classification evaluated via macro F1), our team LinguIUTics achieves a macro F1-score of 0.3917 on the official positive-class leaderboard, ranking 4th out of 21 registered teams and improving over the Ministral-8B task baseline (31.48 macro F1) by 7.7 absolute points (24.4 percent relative). BERT-family encoders and zero-shot LLMs proved ineffective on rare classes due to severe class imbalance, leading us to QLoRA fine-tuning of Qwen3-8B. We leverage three key strategies: grouped stratified cross-validation (preventing leakage), minority-class round-robin lexical augmentation, and a post-processing pipeline with logit bias tuning and ensemble blending. Together, these components close much of the validation-to-leaderboard gap and substantially improve minority-class recall, driving the critical "Unclear" class (Level 8) from near-zero performance to an F1 score of 0.797.