Small Language Models in the Real World: Insights from Industrial Text Classification

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Lightweight decoder-only small language models (≤7B parameters) face critical bottlenecks—low inference efficiency, high prompt sensitivity, and excessive GPU memory consumption—when deployed for industrial text classification tasks (e.g., emails, legal documents, and ultra-long academic texts). Method: We propose a VRAM-aware evaluation framework integrating dynamic prompt template design, task-adapted supervised fine-tuning, and joint memory-throughput benchmarking. Contribution/Results: This work presents the first cross-task, long-text industrial evaluation of small models’ inference efficiency and classification robustness. It reveals a nonlinear trade-off between prompt quality and model scale. Experiments across three real-world tasks achieve >92% accuracy, with 60% GPU memory reduction and 45% latency improvement over generic prompting—demonstrating strong viability for on-device or edge-deployed lightweight NLP systems.

Technology Category

Application Category

📝 Abstract

With the emergence of ChatGPT, Transformer models have significantly advanced text classification and related tasks. Decoder-only models such as Llama exhibit strong performance and flexibility, yet they suffer from inefficiency on inference due to token-by-token generation, and their effectiveness in text classification tasks heavily depends on prompt quality. Moreover, their substantial GPU resource requirements often limit widespread adoption. Thus, the question of whether smaller language models are capable of effectively handling text classification tasks emerges as a topic of significant interest. However, the selection of appropriate models and methodologies remains largely underexplored. In this paper, we conduct a comprehensive evaluation of prompt engineering and supervised fine-tuning methods for transformer-based text classification. Specifically, we focus on practical industrial scenarios, including email classification, legal document categorization, and the classification of extremely long academic texts. We examine the strengths and limitations of smaller models, with particular attention to both their performance and their efficiency in Video Random-Access Memory (VRAM) utilization, thereby providing valuable insights for the local deployment and application of compact models in industrial settings.

Problem

Research questions and friction points this paper is trying to address.

Evaluating small language models for industrial text classification tasks

Assessing efficiency and VRAM usage of compact models in real-world applications

Exploring prompt engineering and fine-tuning methods for transformer-based classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates prompt engineering and fine-tuning methods

Focuses on small models for industrial text classification

Analyzes VRAM efficiency for local deployment

🔎 Similar Papers

Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even Performance