Advancing Text Classification with Large Language Models and Neural Attention Mechanisms

📅 2025-12-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address three key challenges in text classification—weak long-range dependency modeling, shallow semantic understanding, and severe class imbalance—this paper proposes an end-to-end classification framework integrating large language models (LLMs) with a novel dual-path neural attention mechanism. Methodologically, the approach leverages deep semantic representations from LLMs, enhances contextual modeling via multi-head self-attention, and introduces a pioneering global–weighted dual-path feature aggregation scheme to strengthen discriminative capability for infrequent classes. Classification is performed through a fully connected output head optimized with cross-entropy loss. Extensive experiments on multiple benchmark datasets demonstrate consistent improvements: F1 scores increase by 3.2–5.8% and AUC by 4.1–6.3%, with particularly pronounced gains in recall. Moreover, the model exhibits strong robustness across varying class imbalance ratios and hyperparameter configurations, confirming its generalizability and effectiveness.

Technology Category

Application Category

📝 Abstract
This study proposes a text classification algorithm based on large language models, aiming to address the limitations of traditional methods in capturing long-range dependencies, understanding contextual semantics, and handling class imbalance. The framework includes text encoding, contextual representation modeling, attention-based enhancement, feature aggregation, and classification prediction. In the representation stage, deep semantic embeddings are obtained through large-scale pretrained language models, and attention mechanisms are applied to enhance the selective representation of key features. In the aggregation stage, global and weighted strategies are combined to generate robust text-level vectors. In the classification stage, a fully connected layer and Softmax output are used to predict class distributions, and cross-entropy loss is employed to optimize model parameters. Comparative experiments introduce multiple baseline models, including recurrent neural networks, graph neural networks, and Transformers, and evaluate them on Precision, Recall, F1-Score, and AUC. Results show that the proposed method outperforms existing models on all metrics, with especially strong improvements in Recall and AUC. In addition, sensitivity experiments are conducted on hyperparameters and data conditions, covering the impact of hidden dimensions on AUC and the impact of class imbalance ratios on Recall. The findings demonstrate that proper model configuration has a significant effect on performance and reveal the adaptability and stability of the model under different conditions. Overall, the proposed text classification method not only achieves effective performance improvement but also verifies its robustness and applicability in complex data environments through systematic analysis.
Problem

Research questions and friction points this paper is trying to address.

Improves text classification using large language models and attention mechanisms
Addresses long-range dependencies and contextual semantics in text
Handles class imbalance and enhances model robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large language models for deep semantic embeddings
Attention mechanisms for key feature representation
Global and weighted aggregation for robust vectors
🔎 Similar Papers
No similar papers found.
N
Ning Lyu
Carnegie Mellon University
Yuxi Wang
Yuxi Wang
Ocean University of China
Computer Vision
F
Feng Chen
Northeastern University
Q
Qingyuan Zhang
Boston University