Catch Me If You Can? Not Yet: LLMs Still Struggle to Imitate the Implicit Writing Styles of Everyday Authors

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates whether large language models (LLMs) can accurately emulate the implicit writing styles of ordinary individuals from only a few examples. Method: We propose a multidimensional evaluation framework integrating author attribution, style matching, human validation, and AI detection—overcoming limitations of single-metric assessments—and employ in-context learning with diverse prompting strategies. Experiments span over 400 real authors and >40,000 generated samples across four text genres: news, email, forum posts, and blogs. Contribution/Results: LLMs demonstrate moderate structural style fidelity in formal, rule-governed genres (e.g., email, news), but exhibit significant degradation in informal, idiosyncratic genres (e.g., forums, blogs), revealing fundamental deficiencies in modeling implicit stylistic cues. To our knowledge, this is the first large-scale empirical study exposing the critical bottlenecks of current LLMs in few-shot personal style imitation. It establishes a reproducible evaluation paradigm and provides benchmark data to advance controllable stylistic generation.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) become increasingly integrated into personal writing tools, a critical question arises: can LLMs faithfully imitate an individual's writing style from just a few examples? Personal style is often subtle and implicit, making it difficult to specify through prompts yet essential for user-aligned generation. This work presents a comprehensive evaluation of state-of-the-art LLMs' ability to mimic personal writing styles via in-context learning from a small number of user-authored samples. We introduce an ensemble of complementary metrics-including authorship attribution, authorship verification, style matching, and AI detection-to robustly assess style imitation. Our evaluation spans over 40000 generations per model across domains such as news, email, forums, and blogs, covering writing samples from more than 400 real-world authors. Results show that while LLMs can approximate user styles in structured formats like news and email, they struggle with nuanced, informal writing in blogs and forums. Further analysis on various prompting strategies such as number of demonstrations reveal key limitations in effective personalization. Our findings highlight a fundamental gap in personalized LLM adaptation and the need for improved techniques to support implicit, style-consistent generation. To aid future research and for reproducibility, we open-source our data and code.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to imitate personal writing styles

Assessing style imitation via in-context learning from user samples

Identifying limitations in mimicking nuanced informal writing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluating LLMs' style imitation via in-context learning

Introducing ensemble metrics for robust style assessment

Testing over 40000 generations across multiple domains

🔎 Similar Papers

No similar papers found.

Authors to Follow