OwkinZero: Accelerating Biological Discovery with AI

📅 2025-08-22

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current large language models (LLMs) exhibit limited performance on biomedical reasoning tasks—including target druggability assessment, therapeutic modality matching, and drug perturbation effect prediction—hindering translational medicine progress. To address this, we propose a verifiability-guided reinforcement learning framework that performs post-training on open-source small-scale LLMs using a self-constructed dataset of over 300,000 verifiable biomedical question-answer pairs, yielding the OwkinZero model. Our method significantly enhances cross-task generalization, achieving— for the first time—a small-model superiority over larger commercial LLMs on standardized biomedical reasoning benchmarks. A hybrid training variant further attains state-of-the-art performance across all evaluated tasks. This work establishes a new paradigm for AI-driven biological discovery: efficient, verifiable, and fully reproducible.

Technology Category

Application Category

📝 Abstract

While large language models (LLMs) are rapidly advancing scientific research, they continue to struggle with core biological reasoning tasks essential for translational and biomedical discovery. To address this limitation, we created and curated eight comprehensive benchmark datasets comprising over 300,000 verifiable question-and-answer pairs, each targeting critical challenges in drug discovery including target druggability, modality suitability, and drug perturbation effects. Using this resource, we developed the OwkinZero models by post-training open-source LLMs through a Reinforcement Learning from Verifiable Rewards strategy. Our results demonstrate that specialized 8-32B OwkinZero models substantially outperform larger, state-of-the-art commercial LLMs on these biological benchmarks. Remarkably, we uncover evidence of a key aspect of generalization: specialist models trained on a single task consistently outperform their base models on previously unseen tasks. This generalization effect is further amplified in our comprehensive OwkinZero models, which were trained on a mixture of datasets and achieve even broader cross-task improvements. This study represents a significant step toward addressing the biological reasoning blind spot in current LLMs, demonstrating that targeted reinforcement learning on carefully curated data can unlock generalizable performance in specialized models, thereby accelerating AI-driven biological discovery.

Problem

Research questions and friction points this paper is trying to address.

Addressing biological reasoning limitations in large language models

Creating specialized benchmarks for drug discovery challenges

Enhancing AI generalization in biomedical tasks through reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning from Verifiable Rewards strategy

Specialized 8-32B models outperform larger commercial LLMs

Trained on curated datasets for biological reasoning tasks

🔎 Similar Papers

Toward a Team of AI-made Scientists for Scientific Discovery from Gene Expression Data

2024-02-15arXiv.orgCitations: 6

Authors to Follow