AtomWorld: A Benchmark for Evaluating Spatial Reasoning in Large Language Models on Crystalline Materials

📅 2025-10-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current large language models (LLMs) excel at textual reasoning but lack standardized, materials-science-oriented benchmarks for spatial reasoning—particularly for systematically evaluating their understanding of 3D atomic structures such as crystals. To address this gap, we introduce the first crystallographic spatial reasoning benchmark grounded in Crystallographic Information Files (CIF), comprising three task categories that jointly require geometric and chemical knowledge: structural editing, CIF-aware parsing, and property-guided modeling. Experimental results reveal high error rates among state-of-the-art LLMs in atom-level structural modification and CIF semantic interpretation, exposing fundamental deficiencies in their spatial representation and geometric reasoning capabilities. This benchmark establishes a reproducible, extensible framework for assessing spatial cognition in scientific AI, and provides concrete guidance for advancing model architectures and training paradigms toward robust 3D materials understanding.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) excel at textual reasoning and are beginning to develop spatial understanding, prompting the question of whether these abilities can be combined for complex, domain-specific tasks. This question is essential in fields like materials science, where deep understanding of 3D atomic structures is fundamental. While initial studies have successfully applied LLMs to tasks involving pure crystal generation or coordinate understandings, a standardized benchmark to systematically evaluate their core reasoning abilities across diverse atomic structures has been notably absent. To address this gap, we introduce the AtomWorld benchmark to evaluate LLMs on tasks based in Crystallographic Information Files (CIFs), a standard structure representation format. These tasks, including structural editing, CIF perception, and property-guided modeling, reveal a critical limitation: current models, despite establishing promising baselines, consistently fail in structural understanding and spatial reasoning. Our experiments show that these models make frequent errors on structure modification tasks, and even in the basic CIF format understandings, potentially leading to cumulative errors in subsequent analysis and materials insights. By defining these standardized tasks, AtomWorld lays the ground for advancing LLMs toward robust atomic-scale modeling, crucial for accelerating materials research and automating scientific workflows.
Problem

Research questions and friction points this paper is trying to address.

Evaluating spatial reasoning in LLMs for crystalline materials
Assessing structural understanding through standardized benchmark tasks
Identifying limitations in atomic structure modification and analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark evaluates LLMs on crystal structure tasks
Uses CIF format for standardized spatial reasoning assessment
Tests structural editing and property-guided modeling capabilities
🔎 Similar Papers
No similar papers found.
T
Taoyuze Lv
Suzhou Institute for Advanced Research, University of Science and Technology of China
A
Alexander Chen
University of New South Wales
F
Fengyu Xie
Suzhou Institute for Advanced Research, University of Science and Technology of China
C
Chu Wu
Suzhou Institute for Advanced Research, University of Science and Technology of China
J
Jeffrey Meng
University of New South Wales
Dongzhan Zhou
Dongzhan Zhou
Researcher at Shanghai AI Lab
AI4Sciencecomputer visiondeep learning
Bram Hoex
Bram Hoex
Professor, UNSW Sydney
Solar EnergySolar CellsSurface PassivationAdvanced Characterisation
Z
Zhicheng Zhong
Suzhou Institute for Advanced Research, University of Science and Technology of China
Tong Xie
Tong Xie
Green Dynamics & University of New South Wales
Solar CellsLarge Language ModelsCheminformaticsNano Materials