A Meta-Analysis of the Persuasive Power of Large Language Models

📅 2025-12-01

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study systematically examines differences in persuasive efficacy between large language models (LLMs) and humans. Through a meta-analysis of 42 empirical studies—employing Hedges’ *g* effect sizes, Egger’s regression test, and the trim-and-fill method—we find no statistically significant difference in overall persuasiveness (*g* = 0.02, *p* = .530). The key contribution lies in identifying the synergistic moderating role of contextual factors: model type, dialogue design, and application domain. Their three-way interaction accounts for 81.93% of effect-size heterogeneity—substantially exceeding the explanatory power of any single moderator. Results indicate that LLMs’ persuasive effectiveness is not determined by whether they replace humans, but rather by systematic optimization of task alignment and interactive design. These findings provide empirical grounding and a theoretical framework for the credible, effective deployment of LLMs in domains such as political communication and digital marketing.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are increasingly used for persuasion, such as in political communication and marketing, where they affect how people think, choose, and act. Yet, empirical findings on the effectiveness of LLMs in persuasion compared to humans remain inconsistent. The aim of this study was to systematically review and meta-analytically assess whether LLMs differ from humans in persuasive effectiveness. We identified $7$ studies with 17,422 participants primarily recruited from English-speaking countries and $12$ effect size estimates. Egger's test indicated potential small-study effects ($p = .018$), but the trim-and-fill analysis did not impute any missing studies, suggesting a low risk of publication bias. We then compute the standardized effect sizes based on Hedges' $g$. The results show no significant overall difference in persuasive performance between LLMs and humans ($g = 0.02$, $p = .530$). However, we observe substantial heterogeneity across studies ($I^2 = 75.97%$), suggesting that persuasiveness strongly depends on contextual factors. In separate exploratory moderator analyses, no individual factor (e.g., LLM model, conversation design, or domain) reached statistical significance, which may be due to the limited number of studies. When considered jointly in a combined model, these factors explained a large proportion of the between-study variance ($R^2 = 81.93%$), and residual heterogeneity is low ($I^2 = 35.51%$). Although based on a small number of studies, this suggests that differences in LLM model, conversation design, and domain are important contextual factors in shaping persuasive performance, and that single-factor tests may understate their influence. Our results highlight that LLMs can match human performance in persuasion, but their success depends strongly on how they are implemented and embedded in communication contexts.

Problem

Research questions and friction points this paper is trying to address.

Compares persuasive effectiveness of LLMs versus humans

Assesses contextual factors influencing LLM persuasiveness

Examines moderators like model type and conversation design

Innovation

Methods, ideas, or system contributions that make the work stand out.

Meta-analysis compares LLM and human persuasive effectiveness

Uses Hedges' g to standardize effect sizes across studies

Explores contextual factors like model and design moderators

🔎 Similar Papers

Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language

2024-06-25arXiv.orgCitations: 3

The Adoption and Efficacy of Large Language Models: Evidence From Consumer Complaints in the Financial Industry

2023-11-28Citations: 4

Authors to Follow