Chem42: a Family of chemical Language Models for Target-aware Ligand Generation

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current chemical language models (cLMs) struggle to effectively incorporate target-specific information, limiting their utility in target-aware de novo ligand generation. To address this, we introduce the first target-aware generative cLM family, which achieves deep integration of structural target priors into cLMs via cross-modal collaboration with the protein language model Prot42—enabling atom-level protein–ligand interaction modeling for the first time. Our approach comprises three key components: multimodal representation learning, joint fine-tuning with Prot42, and target-conditioned molecular decoding. Evaluated across multiple protein targets, the model substantially improves chemical validity (>98%), target selectivity, and binding affinity prediction accuracy, while dramatically narrowing the search space for viable candidates. The open-source models establish new state-of-the-art performance on the Hugging Face chemical benchmark, offering a novel paradigm for synthesizable, highly specific ligand design.

Technology Category

Application Category

📝 Abstract
Revolutionizing drug discovery demands more than just understanding molecular interactions - it requires generative models that can design novel ligands tailored to specific biological targets. While chemical Language Models (cLMs) have made strides in learning molecular properties, most fail to incorporate target-specific insights, restricting their ability to drive de-novo ligand generation. Chem42, a cutting-edge family of generative chemical Language Models, is designed to bridge this gap. By integrating atomic-level interactions with multimodal inputs from Prot42, a complementary protein Language Model, Chem42 achieves a sophisticated cross-modal representation of molecular structures, interactions, and binding patterns. This innovative framework enables the creation of structurally valid, synthetically accessible ligands with enhanced target specificity. Evaluations across diverse protein targets confirm that Chem42 surpasses existing approaches in chemical validity, target-aware design, and predicted binding affinity. By reducing the search space of viable drug candidates, Chem42 could accelerate the drug discovery pipeline, offering a powerful generative AI tool for precision medicine. Our Chem42 models set a new benchmark in molecule property prediction, conditional molecule generation, and target-aware ligand design. The models are publicly available at huggingface.co/inceptionai.
Problem

Research questions and friction points this paper is trying to address.

Develops target-specific generative models for novel ligands
Integrates atomic interactions with protein data for binding patterns
Enhances chemical validity and target-aware ligand design
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates atomic-level and multimodal protein inputs
Generates target-specific, synthetically accessible ligands
Surpasses existing methods in binding affinity prediction
🔎 Similar Papers
No similar papers found.
A
Aahan Singh
Inception Institute of Artificial Intelligence, Abu Dhabi, UAE.
E
Engin Tekin
Cerebras Systems, Sunnyvale, CA, USA.
M
Maryam Nadeem
Inception Institute of Artificial Intelligence, Abu Dhabi, UAE.
N
Nancy A Elnaker
Inception Institute of Artificial Intelligence, Abu Dhabi, UAE.
M
Mohammad Amaan Sayeed
Inception Institute of Artificial Intelligence, Abu Dhabi, UAE.
Natalia Vassilieva
Natalia Vassilieva
Sr. Director of Product, Cerebras Systems
image analysisinformation retrievalinformatin extractionmachine learningnatural language processing
B
B. Amor
Inception Institute of Artificial Intelligence, Abu Dhabi, UAE.