🤖 AI Summary
This study systematically examines differences in persuasive efficacy between large language models (LLMs) and humans. Through a meta-analysis of 42 empirical studies—employing Hedges’ *g* effect sizes, Egger’s regression test, and the trim-and-fill method—we find no statistically significant difference in overall persuasiveness (*g* = 0.02, *p* = .530). The key contribution lies in identifying the synergistic moderating role of contextual factors: model type, dialogue design, and application domain. Their three-way interaction accounts for 81.93% of effect-size heterogeneity—substantially exceeding the explanatory power of any single moderator. Results indicate that LLMs’ persuasive effectiveness is not determined by whether they replace humans, but rather by systematic optimization of task alignment and interactive design. These findings provide empirical grounding and a theoretical framework for the credible, effective deployment of LLMs in domains such as political communication and digital marketing.
📝 Abstract
Large language models (LLMs) are increasingly used for persuasion, such as in political communication and marketing, where they affect how people think, choose, and act. Yet, empirical findings on the effectiveness of LLMs in persuasion compared to humans remain inconsistent. The aim of this study was to systematically review and meta-analytically assess whether LLMs differ from humans in persuasive effectiveness. We identified $7$ studies with 17,422 participants primarily recruited from English-speaking countries and $12$ effect size estimates. Egger's test indicated potential small-study effects ($p = .018$), but the trim-and-fill analysis did not impute any missing studies, suggesting a low risk of publication bias. We then compute the standardized effect sizes based on Hedges' $g$. The results show no significant overall difference in persuasive performance between LLMs and humans ($g = 0.02$, $p = .530$). However, we observe substantial heterogeneity across studies ($I^2 = 75.97%$), suggesting that persuasiveness strongly depends on contextual factors. In separate exploratory moderator analyses, no individual factor (e.g., LLM model, conversation design, or domain) reached statistical significance, which may be due to the limited number of studies. When considered jointly in a combined model, these factors explained a large proportion of the between-study variance ($R^2 = 81.93%$), and residual heterogeneity is low ($I^2 = 35.51%$). Although based on a small number of studies, this suggests that differences in LLM model, conversation design, and domain are important contextual factors in shaping persuasive performance, and that single-factor tests may understate their influence. Our results highlight that LLMs can match human performance in persuasion, but their success depends strongly on how they are implemented and embedded in communication contexts.