Controlling Out-of-Domain Gaps in LLMs for Genre Classification and Generated Text Detection

📅 2024-12-29

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

Large language models (e.g., GPT-4) suffer substantial performance degradation in cross-domain genre classification and AI-generated text detection due to domain shift, primarily stemming from overreliance on topical cues. To address this, we propose a controllable feature selection mechanism that integrates a controllable prediction indicator into few-shot in-context learning, enabling stylistic feature focusing while suppressing topic-based signals. Our approach is the first to jointly combine controllable feature masking with style-oriented prompt design, augmented by a topic-agnostic representation constraint. Experiments demonstrate that our method improves cross-domain generalization by up to 20 percentage points on both multi-genre classification and AI-text detection tasks, significantly narrowing the out-of-distribution (OOD) performance gap. It thereby overcomes key limitations of conventional chain-of-thought (CoT) methods in domain transfer, establishing a new state of the art in robust, style-aware language understanding.

Technology Category

Application Category

📝 Abstract

This study demonstrates that the modern generation of Large Language Models (LLMs, such as GPT-4) suffers from the same out-of-domain (OOD) performance gap observed in prior research on pre-trained Language Models (PLMs, such as BERT). We demonstrate this across two non-topical classification tasks: 1) genre classification and 2) generated text detection. Our results show that when demonstration examples for In-Context Learning (ICL) come from one domain (e.g., travel) and the system is tested on another domain (e.g., history), classification performance declines significantly. To address this, we introduce a method that controls which predictive indicators are used and which are excluded during classification. For the two tasks studied here, this ensures that topical features are omitted, while the model is guided to focus on stylistic rather than content-based attributes. This approach reduces the OOD gap by up to 20 percentage points in a few-shot setup. Straightforward Chain-of-Thought (CoT) methods, used as the baseline, prove insufficient, while our approach consistently enhances domain transfer performance.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Textual Genre Recognition

Performance Degradation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-domain tasks

Style-focused approach

Performance enhancement

🔎 Similar Papers

Learning to Rewrite: Generalized LLM-Generated Text Detection