Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the weak interpretability and insufficient domain-knowledge integration of long-chain-of-thought (Long-CoT) large language models in molecular discovery, this paper introduces Mol-R1—the first explicit long-chain reasoning optimization framework for text-to-molecule generation. Methodologically, we propose Prior Regulation via In-context Distillation (PRID) to construct a high-quality chemical reasoning dataset, and design the Molecular Iterative Adaptation (MoIA) training paradigm, which synergistically integrates chemical prior modeling, in-context distillation, supervised fine-tuning, and Reinforcement Policy Optimization (RPO). Our key contribution is the first systematic enhancement of R1-class models’ long-chain reasoning capabilities for molecular generation. Mol-R1 achieves significant improvements over state-of-the-art baselines across multiple text-guided molecule generation tasks, demonstrating superior chemical validity, reasoning accuracy, and generation efficiency.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs), especially Explicit Long Chain-of-Thought (CoT) reasoning models like DeepSeek-R1 and QWQ, have demonstrated powerful reasoning capabilities, achieving impressive performance in commonsense reasoning and mathematical inference. Despite their effectiveness, Long-CoT reasoning models are often criticized for their limited ability and low efficiency in knowledge-intensive domains such as molecule discovery. Success in this field requires a precise understanding of domain knowledge, including molecular structures and chemical principles, which is challenging due to the inherent complexity of molecular data and the scarcity of high-quality expert annotations. To bridge this gap, we introduce Mol-R1, a novel framework designed to improve explainability and reasoning performance of R1-like Explicit Long-CoT reasoning LLMs in text-based molecule generation. Our approach begins with a high-quality reasoning dataset curated through Prior Regulation via In-context Distillation (PRID), a dedicated distillation strategy to effectively generate paired reasoning traces guided by prior regulations. Building upon this, we introduce MoIA, Molecular Iterative Adaptation, a sophisticated training strategy that iteratively combines Supervised Fine-tuning (SFT) with Reinforced Policy Optimization (RPO), tailored to boost the reasoning performance of R1-like reasoning models for molecule discovery. Finally, we examine the performance of Mol-R1 in the text-based molecule reasoning generation task, showing superior performance against existing baselines.
Problem

Research questions and friction points this paper is trying to address.

Enhance Long-CoT reasoning for molecule discovery tasks
Address scarcity of expert molecular knowledge annotations
Improve explainability in text-based molecule generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

PRID for high-quality reasoning dataset curation
MoIA combines SFT and RPO iteratively
Mol-R1 enhances explainability in molecule generation
🔎 Similar Papers
No similar papers found.