🤖 AI Summary
This study addresses the risk that large language models (LLMs) may perpetuate harmful myths about autism spectrum disorder (ASD), thereby influencing public understanding and health-related decisions. It presents the first systematic comparison between human participants and leading LLMs—including GPT-4, Claude, and Gemini—in identifying autism-related misconceptions, using a standardized autism knowledge scale. In a controlled experiment involving 178 human participants and multiple model versions, humans demonstrated significantly lower average error rates than LLMs (36.2% vs. 44.8%, p = 0.0048), outperforming models on 18 of 30 items. These findings reveal critical gaps in current AI systems’ understanding of neurodiversity and underscore the urgent need to integrate neurodiversity-informed perspectives into AI development and evaluation frameworks.
📝 Abstract
As Large Language Models become ubiquitous sources of health information, understanding their capacity to accurately represent stigmatized conditions is crucial for responsible deployment. This study examines whether leading AI systems perpetuate or challenge misconceptions about Autism Spectrum Disorder, a condition particularly vulnerable to harmful myths. We administered a 30-item instrument measuring autism knowledge to 178 participants and three state-of-the-art LLMs including GPT-4, Claude, and Gemini. Contrary to expectations that AI systems would leverage their vast training data to outperform humans, we found the opposite pattern: human participants endorsed significantly fewer myths than LLMs (36.2% vs. 44.8% error rate; z = -2.59, p = .0048). In 18 of the 30 evaluated items, humans significantly outperformed AI systems. These findings reveal a critical blind spot in current AI systems and have important implications for human-AI interaction design, the epistemology of machine knowledge, and the need to center neurodivergent perspectives in AI development.