ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

📅 2024-08-14
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language models (LLMs) for chemistry lack visual understanding capabilities, limiting their ability to process multimodal chemical data such as molecular structure diagrams and reaction schemes. Method: We propose ChemVLM—the first open-source multimodal LLM tailored for chemistry—capable of cross-modal understanding and reasoning over molecular images, SMILES, InChI strings, and chemistry examination questions. ChemVLM employs an LLM-expansion architecture integrating a vision encoder with bilingual (Chinese–English) multimodal pretraining, augmented by task-aligned fine-tuning. We further construct the first domain-specific multimodal training dataset for chemistry and design three novel evaluation benchmarks: OCR-based recognition, multimodal chemical reasoning (MMCR), and molecular understanding. Results: Experiments demonstrate that ChemVLM achieves state-of-the-art performance among open-source models on multiple chemical multimodal tasks, improving accuracy by 12.6% over general-purpose multimodal models on the MMCR benchmark.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have achieved remarkable success and have been applied across various scientific fields, including chemistry. However, many chemical tasks require the processing of visual information, which cannot be successfully handled by existing chemical LLMs. This brings a growing need for models capable of integrating multimodal information in the chemical domain. In this paper, we introduce extbf{ChemVLM}, an open-source chemical multimodal large language model specifically designed for chemical applications. ChemVLM is trained on a carefully curated bilingual multimodal dataset that enhances its ability to understand both textual and visual chemical information, including molecular structures, reactions, and chemistry examination questions. We develop three datasets for comprehensive evaluation, tailored to Chemical Optical Character Recognition (OCR), Multimodal Chemical Reasoning (MMCR), and Multimodal Molecule Understanding tasks. We benchmark ChemVLM against a range of open-source and proprietary multimodal large language models on various tasks. Experimental results demonstrate that ChemVLM achieves competitive performance across all evaluated tasks. Our model can be found at https://huggingface.co/AI4Chem/ChemVLM-26B.
Problem

Research questions and friction points this paper is trying to address.

Integrate visual and textual chemical information
Enhance understanding of molecular structures and reactions
Develop a multimodal model for chemical tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal LLM for chemistry tasks
Bilingual dataset for text and visuals
Benchmarked against top models
🔎 Similar Papers
Junxian Li
Junxian Li
NSEC lab,Shanghai Jiaotong University
AI securityReasoningData Mining
D
Di Zhang
Shanghai Artificial Intelligence Laboratory, Fudan University
X
Xunzhi Wang
Shanghai Artificial Intelligence Laboratory, Nankai University
Z
Zeying Hao
Shanghai Artificial Intelligence Laboratory, University of Science and Technology of China
Jingdi Lei
Jingdi Lei
PhD student, Nanyang Technological University
Vision Language ModelsLanguage Model ReasoningMachine LearningArtificial Intelligence
Q
Qian Tan
Shanghai Artificial Intelligence Laboratory
Cai Zhou
Cai Zhou
Massachusetts Institute of Technology
Machine LearningGenerative ModelsLarge Language ModelsGraph Neural NetworksAI4Science
W
Wei Liu
Shanghai Artificial Intelligence Laboratory, Shanghai Jiaotong University
Weiyun Wang
Weiyun Wang
Shanghai AI Laboratory; Fudan University
Vision-Language ModelMLLMFoundation Model
Z
Zhe Chen
Shanghai Artificial Intelligence Laboratory, Nanjing University
W
Wenhai Wang
Shanghai Artificial Intelligence Laboratory, The Chinese University of Hong Kong
W
Wei Li
Shanghai Artificial Intelligence Laboratory
S
Shufei Zhang
Shanghai Artificial Intelligence Laboratory
Mao Su
Mao Su
Shanghai AI Laboratory
PhysicsAI
W
Wanli Ouyang
Shanghai Artificial Intelligence Laboratory
Yuqiang Li
Yuqiang Li
Central South University
Internal Combustion EngineCombustionEmissionsMechansim
Dongzhan Zhou
Dongzhan Zhou
Researcher at Shanghai AI Lab
AI4Sciencecomputer visiondeep learning