Chem42: a Family of chemical Language Models for Target-aware Ligand Generation

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

178K/year
🤖 AI Summary
Current chemical language models (cLMs) struggle to effectively incorporate target-specific information, limiting their utility in target-aware de novo ligand generation. To address this, we introduce the first target-aware generative cLM family, which achieves deep integration of structural target priors into cLMs via cross-modal collaboration with the protein language model Prot42—enabling atom-level protein–ligand interaction modeling for the first time. Our approach comprises three key components: multimodal representation learning, joint fine-tuning with Prot42, and target-conditioned molecular decoding. Evaluated across multiple protein targets, the model substantially improves chemical validity (>98%), target selectivity, and binding affinity prediction accuracy, while dramatically narrowing the search space for viable candidates. The open-source models establish new state-of-the-art performance on the Hugging Face chemical benchmark, offering a novel paradigm for synthesizable, highly specific ligand design.

Technology Category

Application Category

📝 Abstract
Revolutionizing drug discovery demands more than just understanding molecular interactions - it requires generative models that can design novel ligands tailored to specific biological targets. While chemical Language Models (cLMs) have made strides in learning molecular properties, most fail to incorporate target-specific insights, restricting their ability to drive de-novo ligand generation. Chem42, a cutting-edge family of generative chemical Language Models, is designed to bridge this gap. By integrating atomic-level interactions with multimodal inputs from Prot42, a complementary protein Language Model, Chem42 achieves a sophisticated cross-modal representation of molecular structures, interactions, and binding patterns. This innovative framework enables the creation of structurally valid, synthetically accessible ligands with enhanced target specificity. Evaluations across diverse protein targets confirm that Chem42 surpasses existing approaches in chemical validity, target-aware design, and predicted binding affinity. By reducing the search space of viable drug candidates, Chem42 could accelerate the drug discovery pipeline, offering a powerful generative AI tool for precision medicine. Our Chem42 models set a new benchmark in molecule property prediction, conditional molecule generation, and target-aware ligand design. The models are publicly available at huggingface.co/inceptionai.
Problem

Research questions and friction points this paper is trying to address.

Develops target-specific generative models for novel ligands
Integrates atomic interactions with protein data for binding patterns
Enhances chemical validity and target-aware ligand design
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates atomic-level and multimodal protein inputs
Generates target-specific, synthetically accessible ligands
Surpasses existing methods in binding affinity prediction
💼 Related Jobs
AI Data Engineer--LLMs / Agentic Systems
Pfizer
The annual base salary for this position ranges from $106,000.00 to $176,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 15.0% of the base salary and eligibility to participate in our share based long term incentive program. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
United States - Massachusetts - Cambridge
A
Aahan Singh
Inception Institute of Artificial Intelligence, Abu Dhabi, UAE.
E
Engin Tekin
Cerebras Systems, Sunnyvale, CA, USA.
M
Maryam Nadeem
Inception Institute of Artificial Intelligence, Abu Dhabi, UAE.
N
Nancy A Elnaker
Inception Institute of Artificial Intelligence, Abu Dhabi, UAE.
M
Mohammad Amaan Sayeed
Inception Institute of Artificial Intelligence, Abu Dhabi, UAE.
Natalia Vassilieva
Natalia Vassilieva
Sr. Director of Product, Cerebras Systems
image analysisinformation retrievalinformatin extractionmachine learningnatural language processing
B
B. Amor
Inception Institute of Artificial Intelligence, Abu Dhabi, UAE.