Ryze: Evidence-Enriched Data Synthesis from Biomedical Papers

📅 2026-05-30

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the limited reliability of general-purpose vision-language models in biomedicine, stemming from their inability to effectively integrate multimodal evidence scattered across figures, tables, captions, and main text. The authors propose Ryze, a system that, for the first time, enables fully automatic construction of question-answering training data while preserving the complete evidential structure. Ryze combines layout-aware analysis of figures and tables with OCR error correction and large language model–driven data cleaning, and introduces a progress-gated progressive post-training strategy that synergistically integrates supervised fine-tuning and reinforcement learning. Built upon Qwen3-VL-8B, the resulting BioVLM-8B model was trained at a cost under \$200 and achieves a weighted accuracy of 48.0% on LAB-Bench—surpassing the baseline by 12.6 percentage points and outperforming GPT-5.2 by 3.8 points.

📝 Abstract

General-purpose VLMs remain unreliable for biomedical research because valid answers in scientific papers depend on evidence split across figures, tables, charts, captions, and referring text. Existing post-training pipelines are bottlenecked by costly expert annotation and by synthetic data that drops this evidence structure. We present Ryze, a fully automated system that converts raw biomedical papers into an evidence-enriched training set and a domain-specialized VLM. Ryze synthesizes QA pairs with complete supporting evidence (visual element, caption, extracted structure, and referring paragraphs), reduces layout and OCR errors via chart/table-aware extraction and LLM-based cleansing, and applies a progress-gated post-training strategy combining supervised fine-tuning with reinforcement learning. Starting from Qwen3-VL-8B, Ryze produces BioVLM-8B at under USD 200, achieving 48.0% weighted accuracy on LAB-Bench, outperforming the base model by +12.6 percentage points (pp) and surpassing GPT-5.2 by +3.8 pp. We release Ryze as open source together with the trained BioVLM-8B model.

Problem

Research questions and friction points this paper is trying to address.

biomedical research

visual language models

evidence structure

data synthesis

scientific papers

Innovation

Methods, ideas, or system contributions that make the work stand out.

evidence-enriched data synthesis

biomedical VLM

chart/table-aware extraction