🤖 AI Summary
This study addresses the challenge of evaluating the reliability of search engines and AI systems in factual yes/no question answering within Chinese-language web environments. The authors construct the first query-level fact-checking dataset for binary factual questions, derived from Chinese search logs, and employ an evidence-driven truth annotation protocol, Baidu Index–based regional popularity analysis, and a multi-system comparative framework to systematically assess nine system types—including traditional search engines, standalone large language models, and retrieval-augmented AI agents. Results reveal that while accuracy among systems providing definitive answers is comparable (73.2%–78.9%), their answer rates vary substantially. A consistent polarity bias favoring “yes” over “no” responses is observed across systems. Notably, the study uncovers a concentration of high-risk health-related queries in specific provinces, highlighting potential region-specific exposure to misinformation.
📝 Abstract
Large Language Models (LLMs) are increasingly integrated into search services, providing direct answers that can reduce users'reliance on traditional result pages. Yet their factual reliability in non-English web ecosystems remains poorly understood, particularly when answering real user queries. We introduce a fact-checking dataset of 12~161 Chinese Yes/No questions derived from real-world online search logs and develop a unified evaluation pipeline to compare three information-access paradigms: traditional search engines, standalone LLMs, and AI-generated overview modules. Our analysis reveals substantial differences in factual accuracy and topic-level variability across systems. By combining this performance with real-world Baidu Index statistics, we further estimate potential exposure to incorrect factual information of Chinese users across regions. These findings highlight structural risks in AI-mediated search and underscore the need for more reliable and transparent information-access tools for the digital world.