Class Distillation with Mahalanobis Contrast: An Efficient Training Paradigm for Pragmatic Language Understanding Tasks

📅 2025-05-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the efficient detection of fine-grained linguistic phenomena—such as gender bias, metaphor, and irony—in low-resource, highly heterogeneous settings, aiming to reduce both computational overhead and data dependency. Methodologically, we propose a Mahalanobis distance-based class-structure-aware loss function that jointly optimizes intra-class compactness and inter-class separability. We further integrate lightweight language model fine-tuning with a contrastive class distillation framework to enable accurate identification of sparse target classes and enhance decision interpretability. Empirically, our approach significantly outperforms strong baselines across three benchmark tasks. Remarkably, it achieves performance on par with multiple large language models using a model with fewer than 1% of their parameters. This work establishes a novel paradigm for safe, transparent, and interpretable social discourse analysis under resource-constrained conditions.

Technology Category

Application Category

📝 Abstract
Detecting deviant language such as sexism, or nuanced language such as metaphors or sarcasm, is crucial for enhancing the safety, clarity, and interpretation of online social discourse. While existing classifiers deliver strong results on these tasks, they often come with significant computational cost and high data demands. In this work, we propose extbf{Cla}ss extbf{D}istillation (ClaD), a novel training paradigm that targets the core challenge: distilling a small, well-defined target class from a highly diverse and heterogeneous background. ClaD integrates two key innovations: (i) a loss function informed by the structural properties of class distributions, based on Mahalanobis distance, and (ii) an interpretable decision algorithm optimized for class separation. Across three benchmark detection tasks -- sexism, metaphor, and sarcasm -- ClaD outperforms competitive baselines, and even with smaller language models and orders of magnitude fewer parameters, achieves performance comparable to several large language models (LLMs). These results demonstrate ClaD as an efficient tool for pragmatic language understanding tasks that require gleaning a small target class from a larger heterogeneous background.
Problem

Research questions and friction points this paper is trying to address.

Detect deviant or nuanced language efficiently
Reduce computational cost and data requirements
Distill small target classes from diverse backgrounds
Innovation

Methods, ideas, or system contributions that make the work stand out.

Class distillation with Mahalanobis contrast
Interpretable decision algorithm for separation
Efficient training with smaller language models
🔎 Similar Papers
No similar papers found.