🤖 AI Summary
Outdated or contaminated training data in multimodal molecular language models (MoLMs) leads to erroneous knowledge, compromising downstream scientific discovery. Method: This work introduces knowledge editing to MoLMs for the first time, proposing a fine-grained editing framework tailored for bidirectional molecule–text generation. It employs a multi-expert knowledge adapter and an expertise-aware editing switch to enable precise multimodal knowledge routing and matching across SMILES, molecular graphs, and textual descriptions—ensuring locality and minimal interference during updates of specific molecular knowledge. Contribution/Results: Evaluated on two state-of-the-art MoLMs, our approach improves reliability by 18.8% and locality by 12.0% over baseline methods, while preserving efficient inference. This work establishes a new paradigm for trustworthy and maintainable molecular AI systems.
📝 Abstract
Understanding and continuously refining multimodal molecular knowledge is crucial for advancing biomedicine, chemistry, and materials science. Molecule language models (MoLMs) have become powerful tools in these domains, integrating structural representations (e.g., SMILES strings, molecular graphs) with rich contextual descriptions (e.g., physicochemical properties). However, MoLMs can encode and propagate inaccuracies due to outdated web-mined training corpora or malicious manipulation, jeopardizing downstream discovery pipelines. While knowledge editing has been explored for general-domain AI, its application to MoLMs remains uncharted, presenting unique challenges due to the multifaceted and interdependent nature of molecular knowledge. In this paper, we take the first step toward MoLM editing for two critical tasks: molecule-to-caption generation and caption-to-molecule generation. To address molecule-specific challenges, we propose MolEdit, a powerful framework that enables targeted modifications while preserving unrelated molecular knowledge. MolEdit combines a Multi-Expert Knowledge Adapter that routes edits to specialized experts for different molecular facets with an Expertise-Aware Editing Switcher that activates the adapters only when input closely matches the stored edits across all expertise, minimizing interference with unrelated knowledge. To systematically evaluate editing performance, we introduce MEBench, a comprehensive benchmark assessing multiple dimensions, including Reliability (accuracy of the editing), Locality (preservation of irrelevant knowledge), and Generality (robustness to reformed queries). Across extensive experiments on two popular MoLM backbones, MolEdit delivers up to 18.8% higher Reliability and 12.0% better Locality than baselines while maintaining efficiency. The code is available at: https://github.com/LzyFischer/MolEdit.