Ryze: Evidence-Enriched Data Synthesis from Biomedical Papers

📅 2026-05-30
📈 Citations: 0
Influential: 0
📄 PDF

career value

176K/year
🤖 AI Summary
This work addresses the limited reliability of general-purpose vision-language models in biomedicine, stemming from their inability to effectively integrate multimodal evidence scattered across figures, tables, captions, and main text. The authors propose Ryze, a system that, for the first time, enables fully automatic construction of question-answering training data while preserving the complete evidential structure. Ryze combines layout-aware analysis of figures and tables with OCR error correction and large language model–driven data cleaning, and introduces a progress-gated progressive post-training strategy that synergistically integrates supervised fine-tuning and reinforcement learning. Built upon Qwen3-VL-8B, the resulting BioVLM-8B model was trained at a cost under \$200 and achieves a weighted accuracy of 48.0% on LAB-Bench—surpassing the baseline by 12.6 percentage points and outperforming GPT-5.2 by 3.8 points.
📝 Abstract
General-purpose VLMs remain unreliable for biomedical research because valid answers in scientific papers depend on evidence split across figures, tables, charts, captions, and referring text. Existing post-training pipelines are bottlenecked by costly expert annotation and by synthetic data that drops this evidence structure. We present Ryze, a fully automated system that converts raw biomedical papers into an evidence-enriched training set and a domain-specialized VLM. Ryze synthesizes QA pairs with complete supporting evidence (visual element, caption, extracted structure, and referring paragraphs), reduces layout and OCR errors via chart/table-aware extraction and LLM-based cleansing, and applies a progress-gated post-training strategy combining supervised fine-tuning with reinforcement learning. Starting from Qwen3-VL-8B, Ryze produces BioVLM-8B at under USD 200, achieving 48.0% weighted accuracy on LAB-Bench, outperforming the base model by +12.6 percentage points (pp) and surpassing GPT-5.2 by +3.8 pp. We release Ryze as open source together with the trained BioVLM-8B model.
Problem

Research questions and friction points this paper is trying to address.

biomedical research
visual language models
evidence structure
data synthesis
scientific papers
Innovation

Methods, ideas, or system contributions that make the work stand out.

evidence-enriched data synthesis
biomedical VLM
chart/table-aware extraction
progress-gated post-training
automated QA generation
🔎 Similar Papers
No similar papers found.