MolSight: Optical Chemical Structure Recognition with SMILES Pretraining, Multi-Granularity Learning and Reinforcement Learning

📅 2025-11-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing optical chemical structure recognition (OCSR) systems struggle to accurately interpret stereochemical information—particularly in distinguishing wedge bonds, dashed bonds, ring conformations, and spatial atomic arrangements. To address this, we propose MolSight, the first framework integrating three sequential stages: SMILES-sequence pretraining, multi-granularity supervised fine-tuning, and GRPO-based reinforcement learning. We introduce auxiliary tasks—chemical bond classification and atomic localization—and construct the first stereochemistry-specific OCSR benchmark dataset. MolSight combines attention mechanisms with a lightweight architecture, achieving state-of-the-art performance across multiple standard OCSR benchmarks. It significantly improves stereoisomer recognition accuracy while reducing model parameters by 32% and accelerating inference speed by 1.8×.

Technology Category

Application Category

📝 Abstract
Optical Chemical Structure Recognition (OCSR) plays a pivotal role in modern chemical informatics, enabling the automated conversion of chemical structure images from scientific literature, patents, and educational materials into machine-readable molecular representations. This capability is essential for large-scale chemical data mining, drug discovery pipelines, and Large Language Model (LLM) applications in related domains. However, existing OCSR systems face significant challenges in accurately recognizing stereochemical information due to the subtle visual cues that distinguish stereoisomers, such as wedge and dash bonds, ring conformations, and spatial arrangements. To address these challenges, we propose MolSight, a comprehensive learning framework for OCSR that employs a three-stage training paradigm. In the first stage, we conduct pre-training on large-scale but noisy datasets to endow the model with fundamental perception capabilities for chemical structure images. In the second stage, we perform multi-granularity fine-tuning using datasets with richer supervisory signals, systematically exploring how auxiliary tasks-specifically chemical bond classification and atom localization-contribute to molecular formula recognition. Finally, we employ reinforcement learning for post-training optimization and introduce a novel stereochemical structure dataset. Remarkably, we find that even with MolSight's relatively compact parameter size, the Group Relative Policy Optimization (GRPO) algorithm can further enhance the model's performance on stereomolecular. Through extensive experiments across diverse datasets, our results demonstrate that MolSight achieves state-of-the-art performance in (stereo)chemical optical structure recognition.
Problem

Research questions and friction points this paper is trying to address.

Automating conversion of chemical structure images to machine-readable molecular representations
Improving recognition of stereochemical information from subtle visual cues
Addressing challenges in chemical data mining and drug discovery pipelines
Innovation

Methods, ideas, or system contributions that make the work stand out.

SMILES pretraining for chemical image perception
Multi-granularity fine-tuning with auxiliary tasks
Reinforcement learning optimization for stereochemical recognition
🔎 Similar Papers
No similar papers found.
Wenrui Zhang
Wenrui Zhang
Meta Platforms, Inc. (previous: UC-Santa Barbara)
Neuromorphic computingSpiking neural networksMachine learningDeep neural networksComputer vision.
Xinggang Wang
Xinggang Wang
Professor, Huazhong University of Science and Technology
Artificial IntelligenceComputer VisionAutonomous DrivingObject DetectionObject Segmentation
B
Bin Feng
School of Electronic Information and Communications, Huazhong University of Science and Technology
W
Wenyu Liu
School of Electronic Information and Communications, Huazhong University of Science and Technology