🤖 AI Summary
This study addresses zero-shot sentence-level writing style change detection—a particularly challenging task in authorship analysis. We evaluate four state-of-the-art large language models (LLMs) in a zero-shot setting on the official PAN 2024/2025 datasets and design semantic-controlled experiments to disentangle content from stylistic signals. Results demonstrate that LLMs effectively capture content-invariant stylistic features, achieving significantly higher accuracy than the competition’s recommended baselines without any fine-tuning. Crucially, this work provides the first empirical evidence that current top-tier LLMs exhibit high sensitivity to fine-grained stylistic variations, enabling robust multi-author style boundary detection in an unsupervised, zero-shot manner. These findings establish a novel paradigm for LLM-driven unsupervised style analysis and deliver critical empirical support for deploying off-the-shelf LLMs in practical authorship attribution and style segmentation tasks.
📝 Abstract
This article explores the zero-shot performance of state-of-the-art large language models (LLMs) on one of the most challenging tasks in authorship analysis: sentence-level style change detection. Benchmarking four LLMs on the official PAN~2024 and 2025 "Multi-Author Writing Style Analysis" datasets, we present several observations. First, state-of-the-art generative models are sensitive to variations in writing style - even at the granular level of individual sentences. Second, their accuracy establishes a challenging baseline for the task, outperforming suggested baselines of the PAN competition. Finally, we explore the influence of semantics on model predictions and present evidence suggesting that the latest generation of LLMs may be more sensitive to content-independent and purely stylistic signals than previously reported.