🤖 AI Summary
This study addresses the challenge of adapting large language models (LLMs) for preschool science education. It presents the first interdisciplinary (biology, chemistry, physics) empirical evaluation targeting children aged 3–6. Grounded in developmental psychology and early science education standards, we developed a multidimensional evaluation framework assessing accuracy, comprehensibility, engagement, and developmental appropriateness. Thirty early childhood educators conducted mixed qualitative and quantitative assessments of age-appropriate explanations generated by GPT-4, Claude, Gemini, and Llama. Results reveal that Claude significantly outperforms others in biology explanations, and model performance strongly correlates with disciplinary abstraction level; systematic cross-model and cross-dimension variations are observed. This work establishes the first empirically grounded benchmark for LLMs in preschool science education and provides actionable guidance on prompt engineering and pedagogical integration.
📝 Abstract
Early childhood science education is crucial for developing scientific literacy, yet translating complex scientific concepts into age-appropriate content remains challenging for educators. Our study evaluates four leading Large Language Models (LLMs) - GPT-4, Claude, Gemini, and Llama - on their ability to generate preschool-appropriate scientific explanations across biology, chemistry, and physics. Through systematic evaluation by 30 nursery teachers using established pedagogical criteria, we identify significant differences in the models' capabilities to create engaging, accurate, and developmentally appropriate content. Unexpectedly, Claude outperformed other models, particularly in biological topics, while all LLMs struggled with abstract chemical concepts. Our findings provide practical insights for educators leveraging AI in early science education and offer guidance for developers working to enhance LLMs' educational applications. The results highlight the potential and current limitations of using LLMs to bridge the early science literacy gap.