ChemATP: A Training-Free Chemical Reasoning Framework for Large Language Models

📅 2025-12-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit inaccurate reasoning in molecular science primarily due to standard string-based molecular representations lacking explicit, fine-grained chemical priors. Training-based approaches statically embed domain knowledge into model parameters—compromising generalizability and knowledge updatability—while training-free methods rely solely on coarse-grained prompts, failing to support atom-level reasoning. Method: We propose the first training-free chemical reasoning framework that decouples domain knowledge from the inference engine, enabling dynamic retrieval of atom-level chemical textual knowledge under a frozen LLM. Contribution/Results: Our approach introduces (1) the first atom-level structured chemical text knowledge base, and (2) a chemistry-aware RAG mechanism with interface adaptation for frozen LLMs. Evaluated across diverse chemical reasoning tasks, it achieves 12.7–34.1% absolute accuracy gains over training-free baselines, matches state-of-the-art trained models, and preserves interpretability, knowledge updatability, and general-purpose reasoning capability.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) exhibit strong general reasoning but struggle in molecular science due to the lack of explicit chemical priors in standard string representations. Current solutions face a fundamental dilemma. Training-based methods inject priors into parameters, but this static coupling hinders rapid knowledge updates and often compromises the model's general reasoning capabilities. Conversely, existing training-free methods avoid these issues but rely on surface-level prompting, failing to provide the fine-grained atom-level priors essential for precise chemical reasoning. To address this issue, we introduce ChemATP, a framework that decouples chemical knowledge from the reasoning engine. By constructing the first atom-level textual knowledge base, ChemATP enables frozen LLMs to explicitly retrieve and reason over this information dynamically. This architecture ensures interpretability and adaptability while preserving the LLM's intrinsic general intelligence. Experiments show that ChemATP significantly outperforms training-free baselines and rivals state-of-the-art training-based models, demonstrating that explicit prior injection is a competitive alternative to implicit parameter updates.
Problem

Research questions and friction points this paper is trying to address.

Enables LLMs to perform precise chemical reasoning without training
Decouples chemical knowledge from reasoning engine for adaptability
Provides atom-level priors to overcome limitations of surface-level prompting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples chemical knowledge from reasoning engine
Uses atom-level textual knowledge base for retrieval
Enables dynamic reasoning with frozen LLMs
M
Mingxu Zhang
The Hong Kong University of Science and Technology (Guangzhou)
Dazhong Shen
Dazhong Shen
Nanjing University of Aeronautics and Astronautics
Data MiningGenerative AI
Q
Qi Zhang
Shanghai AI Laboratory
Y
Ying Sun
The Hong Kong University of Science and Technology (Guangzhou)