🤖 AI Summary
This study investigates the reliability of large language models (LLMs) in scientific question-answering and their potential to propagate fringe scientific viewpoints. By fine-tuning an LLM to prioritize knowledge from non-mainstream papers on the fine-structure constant and gravitational waves, the authors systematically compare its outputs against responses grounded in expert judgment and established scientific consensus. The experiments demonstrate for the first time that strategically manipulated LLMs can generate fluent, persuasive answers that contradict mainstream science yet remain difficult for non-experts to discern as misleading. These findings underscore the significant risk of LLMs being misused to disseminate scientific misinformation in the absence of expert oversight, highlighting their inadequacy as substitutes for professional scientific judgment.
📝 Abstract
This paper is under review in AI and Ethics This study examines whether large language models (LLMs) can reliably answer scientific questions and demonstrates how easily they can be influenced by fringe scientific material. The authors modified custom LLMs to prioritise knowledge in selected fringe papers on the Fine Structure Constant and Gravitational Waves, then compared their responses with those of domain experts and standard LLMs. The altered models produced fluent, convincing answers that contradicted scientific consensus and were difficult for non-experts to detect as misleading. The results show that LLMs are vulnerable to manipulation and cannot replace expert judgment, highlighting risks for public understanding of science and the potential spread of misinformation.