In a Few Words: Comparing Weak Supervision and LLMs for Short Query Intent Classification

📅 2025-04-30

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This paper addresses short-query intent classification (informational/navigational/transactional) by systematically comparing weak supervision (Snorkel + ORCAS-I) against large language models (LLaMA-3.1-8B/70B-Instruct). Experiments reveal that LLMs achieve substantially higher recall (+12.3%) but suffer severe precision degradation (−18.6%), exposing a critical accuracy bottleneck in short-query intent recognition. The work introduces, for the first time, a balanced evaluation perspective jointly optimizing precision and recall, and validates—via in-context learning and fine-tuning—the fundamental limitations of direct LLM-based classification for short queries. Key contributions are: (1) the first empirical comparative study of LLMs versus weak supervision for short-query intent classification; (2) identification of the inherent precision–recall trade-off in LLM-based intent modeling; and (3) a more balanced evaluation framework for search intent modeling, advancing robustness and reliability in retrieval-oriented NLP tasks.

Technology Category

Application Category

📝 Abstract

User intent classification is an important task in information retrieval. Previously, user intents were classified manually and automatically; the latter helped to avoid hand labelling of large datasets. Recent studies explored whether LLMs can reliably determine user intent. However, researchers have recognized the limitations of using generative LLMs for classification tasks. In this study, we empirically compare user intent classification into informational, navigational, and transactional categories, using weak supervision and LLMs. Specifically, we evaluate LLaMA-3.1-8B-Instruct and LLaMA-3.1-70B-Instruct for in-context learning and LLaMA-3.1-8B-Instruct for fine-tuning, comparing their performance to an established baseline classifier trained using weak supervision (ORCAS-I). Our results indicate that while LLMs outperform weak supervision in recall, they continue to struggle with precision, which shows the need for improved methods to balance both metrics effectively.

Problem

Research questions and friction points this paper is trying to address.

Compare weak supervision and LLMs for short query intent classification

Evaluate LLMs' performance in informational, navigational, transactional intent classification

Assess precision and recall trade-offs in LLM-based intent classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compare weak supervision and LLMs for intent classification

Evaluate LLaMA models for in-context learning and fine-tuning

Highlight LLMs' recall advantage but precision challenges

🔎 Similar Papers

No similar papers found.