Catch Me If You Can? Not Yet: LLMs Still Struggle to Imitate the Implicit Writing Styles of Everyday Authors

📅 2025-09-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether large language models (LLMs) can accurately emulate the implicit writing styles of ordinary individuals from only a few examples. Method: We propose a multidimensional evaluation framework integrating author attribution, style matching, human validation, and AI detection—overcoming limitations of single-metric assessments—and employ in-context learning with diverse prompting strategies. Experiments span over 400 real authors and >40,000 generated samples across four text genres: news, email, forum posts, and blogs. Contribution/Results: LLMs demonstrate moderate structural style fidelity in formal, rule-governed genres (e.g., email, news), but exhibit significant degradation in informal, idiosyncratic genres (e.g., forums, blogs), revealing fundamental deficiencies in modeling implicit stylistic cues. To our knowledge, this is the first large-scale empirical study exposing the critical bottlenecks of current LLMs in few-shot personal style imitation. It establishes a reproducible evaluation paradigm and provides benchmark data to advance controllable stylistic generation.

Technology Category

Application Category

📝 Abstract
As large language models (LLMs) become increasingly integrated into personal writing tools, a critical question arises: can LLMs faithfully imitate an individual's writing style from just a few examples? Personal style is often subtle and implicit, making it difficult to specify through prompts yet essential for user-aligned generation. This work presents a comprehensive evaluation of state-of-the-art LLMs' ability to mimic personal writing styles via in-context learning from a small number of user-authored samples. We introduce an ensemble of complementary metrics-including authorship attribution, authorship verification, style matching, and AI detection-to robustly assess style imitation. Our evaluation spans over 40000 generations per model across domains such as news, email, forums, and blogs, covering writing samples from more than 400 real-world authors. Results show that while LLMs can approximate user styles in structured formats like news and email, they struggle with nuanced, informal writing in blogs and forums. Further analysis on various prompting strategies such as number of demonstrations reveal key limitations in effective personalization. Our findings highlight a fundamental gap in personalized LLM adaptation and the need for improved techniques to support implicit, style-consistent generation. To aid future research and for reproducibility, we open-source our data and code.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to imitate personal writing styles
Assessing style imitation via in-context learning from user samples
Identifying limitations in mimicking nuanced informal writing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluating LLMs' style imitation via in-context learning
Introducing ensemble metrics for robust style assessment
Testing over 40000 generations across multiple domains
🔎 Similar Papers
No similar papers found.