Who Wrote This Line? Evaluating the Detection of LLM-Generated Classical Chinese Poetry

📅 2026-04-11

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Existing AI-generated text detection methods struggle to effectively identify classical Chinese poetry produced by large language models (LLMs), due to the genre’s strict metrical constraints, shared poetic imagery, and flexible syntactic structures. To address this gap, this work introduces ChangAn, the first large-scale benchmark specifically designed for detecting LLM-generated classical Chinese poetry, comprising 30,664 poems—10,276 human-authored and 20,388 generated by four state-of-the-art LLMs. The study systematically evaluates the performance of 12 Chinese AI detectors across varying textual granularities and generation strategies. Experimental results demonstrate that current detection tools perform poorly on this highly constrained literary form, thereby validating the necessity and efficacy of the ChangAn benchmark and offering a new direction for research on detecting AI-generated Chinese texts.

Technology Category

Application Category

📝 Abstract

The rapid development of large language models (LLMs) has extended text generation tasks into the literary domain. However, AI-generated literary creations has raised increasingly prominent issues of creative authenticity and ethics in literary world, making the detection of LLM-generated literary texts essential and urgent. While previous works have made significant progress in detecting AI-generated text, it has yet to address classical Chinese poetry. Due to the unique linguistic features of classical Chinese poetry, such as strict metrical regularity, a shared system of poetic imagery, and flexible syntax, distinguishing whether a poem is authored by AI presents a substantial challenge. To address these issues, we introduce ChangAn, a benchmark for detecting LLM-generated classical Chinese poetry that containing total 30,664 poems, 10,276 are human-written poems and 20,388 poems are generated by four popular LLMs. Based on ChangAn, we conducted a systematic evaluation of 12 AI detectors, investigating their performance variations across different text granularities and generation strategies. Our findings highlight the limitations of current Chinese text detectors, which fail to serve as reliable tools for detecting LLM-generated classical Chinese poetry. These results validate the effectiveness and necessity of our proposed ChangAn benchmark. Our dataset and code are available at https://github.com/VelikayaScarlet/ChangAn.

Problem

Research questions and friction points this paper is trying to address.

LLM-generated poetry

classical Chinese poetry

AI text detection

literary authenticity

text generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

classical Chinese poetry

LLM-generated text detection

ChangAn benchmark