AdParaphrase v2.0: Generating Attractive Ad Texts Using a Preference-Annotated Paraphrase Dataset

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This study addresses the problem of modeling, generating, and interpreting linguistic drivers of advertising text attractiveness. Methodologically, we introduce the first large-scale preference-annotated advertising text rewriting dataset (16,460 pairs), featuring ten-worker crowd-sourced preference labeling—20× larger than prior datasets. We integrate statistical linguistics analysis with LLM-based reference-free evaluation to systematically identify key linguistic features enhancing attractiveness (e.g., emotional intensity, concreteness, syntactic diversity) and demonstrate that human preference annotations effectively guide generative model optimization. Results show that models trained on our dataset significantly improve ad click-through rates; moreover, LLM-based evaluation achieves high agreement with human preferences (Spearman’s ρ = 0.82). The dataset and code are publicly released, establishing a new benchmark for advertising copy generation and interpretable AI research.

Technology Category

Application Category

📝 Abstract

Identifying factors that make ad text attractive is essential for advertising success. This study proposes AdParaphrase v2.0, a dataset for ad text paraphrasing, containing human preference data, to enable the analysis of the linguistic factors and to support the development of methods for generating attractive ad texts. Compared with v1.0, this dataset is 20 times larger, comprising 16,460 ad text paraphrase pairs, each annotated with preference data from ten evaluators, thereby enabling a more comprehensive and reliable analysis. Through the experiments, we identified multiple linguistic features of engaging ad texts that were not observed in v1.0 and explored various methods for generating attractive ad texts. Furthermore, our analysis demonstrated the relationships between human preference and ad performance, and highlighted the potential of reference-free metrics based on large language models for evaluating ad text attractiveness. The dataset is publicly available at: https://github.com/CyberAgentAILab/AdParaphrase-v2.0.

Problem

Research questions and friction points this paper is trying to address.

Identifying linguistic factors for attractive ad texts

Developing methods to generate engaging ad paraphrases

Analyzing human preference impact on ad performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Larger annotated ad text paraphrase dataset

Identified engaging ad text linguistic features

Reference-free metrics using large language models

🔎 Similar Papers

AdTEC: A Unified Benchmark for Evaluating Text Quality in Search Engine Advertising

2024-08-12arXiv.orgCitations: 0