AttriLens-Mol: Attribute Guided Reinforcement Learning for Molecular Property Prediction with Large Language Models

📅 2025-08-06

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Large language models (LLMs) for molecular property prediction suffer from heavy reliance on handcrafted prompts, verbose reasoning chains, and poor interpretability. Method: We propose an attribute-guided reinforcement learning framework that employs three synergistic reward signals—structured output constraints, molecular property-counting rewards, and RDKit-based chemical validity verification—to implicitly elicit the model’s intrinsic chemical knowledge and steer it toward generating highly relevant, interpretable molecular features. Contribution/Results: Without requiring extensive labeled data, our method surpasses multiple supervised fine-tuning baselines and state-of-the-art closed-source LLMs under few-shot settings. Moreover, features extracted by our model achieve state-of-the-art performance when fed into a simple decision tree, significantly improving both predictive accuracy and model transparency.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have shown promise in assisting molecular property prediction tasks but often rely on human-crafted prompts and chain-of-thought templates. While recent advanced large reasoning models like DeepSeek-R1 employ reinforcement learning for an extended ``thinking'' process, their reasoning can be verbose and lack relevance. We introduce AttriLens-Mol, an attribute-guided reinforcement learning framework for molecular property prediction with LLMs. AttriLens-Mol steers the model's reasoning by using: (1) a format reward encouraging attribute-based structured output, (2) a count reward to avoid enumerating irrelevant attributes, and (3) a rationality reward using advanced LLMs and RDKit to verify the relatedness of the generated attributes. This approach implicitly elicits the model's inherent knowledge of relevant molecular attributes during reasoning, enables making predictions for the molecular property more effectively. Experiments on both in-distribution and out-of-distribution datasets show that, training both 7B-size R1-Distilled-Qwen2.5 and R1-Distilled-LLaMA3.1 models on 4,000 samples with our proposed AttriLens-Mol method significantly boosts the performance, getting comparable or better results than supervised fine-tuning models (Mol-Instructions, ChemDFM, etc.) and advanced models (GPT-3.5, GPT-4o, DeepSeek-V3, DeepSeek-R1, etc.). Further, our extracted attributes for the target property, when used as features for an interpretable decision tree model, yield superior performance compared to attributes generated by prompting LLMs. This shows that AttriLens-Mol effectively elicits more relevant and predictive molecular attributes, leading to enhanced interpretability and performance for property prediction. We release the code in https://github.com/szu-tera/AttriLens-Mol.

Problem

Research questions and friction points this paper is trying to address.

Improves molecular property prediction using attribute-guided reinforcement learning

Reduces irrelevant reasoning in large language models for chemistry tasks

Enhances interpretability and performance of molecular attribute extraction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attribute-guided reinforcement learning framework

Structured output with format reward

Rationality reward with LLMs and RDKit

🔎 Similar Papers

Improving Targeted Molecule Generation through Language Model Fine-Tuning Via Reinforcement Learning