BioVERSE: Representation Alignment of Biomedical Modalities to LLMs for Multi-Modal Reasoning

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Current biomedical foundation models (BioFMs) operate in embedding spaces disjoint from those of large language models (LLMs), hindering effective cross-modal reasoning. To address this, we propose a two-stage semantic alignment framework: first, leveraging pre-trained BioFMs as multimodal encoders; second, employing lightweight, modality-specific projection layers to map diverse biomedical modalities—including biomedical text, molecular structures, and single-cell representations—into the LLM’s embedding space, without requiring LLM fine-tuning. Integrated with instruction tuning, our approach supports cross-modal question answering, zero-shot cell-type annotation, and interpretable dialogue. Experiments demonstrate superior performance over larger LLM baselines across multiple biomedical tasks, significantly improving reasoning accuracy and output interpretability. To our knowledge, this is the first work enabling plug-and-play, high-compatibility multimodal biomedical joint reasoning without LLM adaptation.

Technology Category

Application Category

📝 Abstract

Recent advances in large language models (LLMs) and biomedical foundation models (BioFMs) have achieved strong results in biological text reasoning, molecular modeling, and single-cell analysis, yet they remain siloed in disjoint embedding spaces, limiting cross-modal reasoning. We present BIOVERSE (Biomedical Vector Embedding Realignment for Semantic Engagement), a two-stage approach that adapts pretrained BioFMs as modality encoders and aligns them with LLMs through lightweight, modality-specific projection layers. The approach first aligns each modality to a shared LLM space through independently trained projections, allowing them to interoperate naturally, and then applies standard instruction tuning with multi-modal data to bring them together for downstream reasoning. By unifying raw biomedical data with knowledge embedded in LLMs, the approach enables zero-shot annotation, cross-modal question answering, and interactive, explainable dialogue. Across tasks spanning cell-type annotation, molecular description, and protein function reasoning, compact BIOVERSE configurations surpass larger LLM baselines while enabling richer, generative outputs than existing BioFMs, establishing a foundation for principled multi-modal biomedical reasoning.

Problem

Research questions and friction points this paper is trying to address.

Aligning biomedical modalities with LLMs for cross-modal reasoning

Enabling zero-shot annotation and multi-modal question answering

Unifying biomedical data with LLM knowledge for generative outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns biomedical modalities to LLM space

Uses lightweight modality-specific projection layers

Enables zero-shot cross-modal reasoning tasks

🔎 Similar Papers

Alifuse: Aligning and Fusing Multimodal Medical Data for Computer-Aided Diagnosis