🤖 AI Summary
This study addresses the problem of modeling, generating, and interpreting linguistic drivers of advertising text attractiveness. Methodologically, we introduce the first large-scale preference-annotated advertising text rewriting dataset (16,460 pairs), featuring ten-worker crowd-sourced preference labeling—20× larger than prior datasets. We integrate statistical linguistics analysis with LLM-based reference-free evaluation to systematically identify key linguistic features enhancing attractiveness (e.g., emotional intensity, concreteness, syntactic diversity) and demonstrate that human preference annotations effectively guide generative model optimization. Results show that models trained on our dataset significantly improve ad click-through rates; moreover, LLM-based evaluation achieves high agreement with human preferences (Spearman’s ρ = 0.82). The dataset and code are publicly released, establishing a new benchmark for advertising copy generation and interpretable AI research.
📝 Abstract
Identifying factors that make ad text attractive is essential for advertising success. This study proposes AdParaphrase v2.0, a dataset for ad text paraphrasing, containing human preference data, to enable the analysis of the linguistic factors and to support the development of methods for generating attractive ad texts. Compared with v1.0, this dataset is 20 times larger, comprising 16,460 ad text paraphrase pairs, each annotated with preference data from ten evaluators, thereby enabling a more comprehensive and reliable analysis. Through the experiments, we identified multiple linguistic features of engaging ad texts that were not observed in v1.0 and explored various methods for generating attractive ad texts. Furthermore, our analysis demonstrated the relationships between human preference and ad performance, and highlighted the potential of reference-free metrics based on large language models for evaluating ad text attractiveness. The dataset is publicly available at: https://github.com/CyberAgentAILab/AdParaphrase-v2.0.