AdParaphrase: Paraphrase Dataset for Analyzing Linguistic Features toward Generating Attractive Ad Texts

📅 2025-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prior work lacks fine-grained, semantically equivalent yet stylistically diverse preference-annotated data for advertising text. Method: We construct AdParaphrase—the first human-annotated paraphrased ad dataset (12K pairs) with explicit preference labels—and conduct contrastive linguistic feature analysis to identify stylistic factors that significantly enhance ad appeal. We then design a feature-guided generation model that optimizes stylistic expression while preserving semantic equivalence. Contribution/Results: Our analysis reveals four key features strongly correlated with human preference: high fluency, moderate length increase, elevated noun density, and strategic use of parentheses (all ρ > 0.7). A fine-tuned feature-guided model achieves statistically significant improvements in attractiveness scores over strong baselines (+18.3%, p < 0.01). Both code and the AdParaphrase dataset are publicly released to support reproducible research in preference-aware ad generation.

Technology Category

Application Category

📝 Abstract
Effective linguistic choices that attract potential customers play crucial roles in advertising success. This study aims to explore the linguistic features of ad texts that influence human preferences. Although the creation of attractive ad texts is an active area of research, progress in understanding the specific linguistic features that affect attractiveness is hindered by several obstacles. First, human preferences are complex and influenced by multiple factors, including their content, such as brand names, and their linguistic styles, making analysis challenging. Second, publicly available ad text datasets that include human preferences are lacking, such as ad performance metrics and human feedback, which reflect people's interests. To address these problems, we present AdParaphrase, a paraphrase dataset that contains human preferences for pairs of ad texts that are semantically equivalent but differ in terms of wording and style. This dataset allows for preference analysis that focuses on the differences in linguistic features. Our analysis revealed that ad texts preferred by human judges have higher fluency, longer length, more nouns, and use of bracket symbols. Furthermore, we demonstrate that an ad text-generation model that considers these findings significantly improves the attractiveness of a given text. The dataset is publicly available at: https://github.com/CyberAgentAILab/AdParaphrase.
Problem

Research questions and friction points this paper is trying to address.

Analyzing linguistic features in ad texts
Exploring human preferences for ad attractiveness
Developing models for generating attractive ad texts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Paraphrase dataset for ad texts
Analyzes human preference linguistic features
Improves ad text attractiveness model