Scholar

Wisdom Ikezogwo

Google Scholar ID: gt5I_iYAAAAJ

PhD Student, University of Washington

Multimodal UnderstandingVisionRepresentation learningMedical Image Analysis.

Citations & Impact

All-time

Citations

366

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

2 items

2026

Cited

2025

Cited

Resume (English only)

Academic Achievements

1. PathFinder: A four-agent AI system for histopathology that outperforms both SOTA methods (+8%) and human pathologists (+9%) in melanoma diagnosis while providing explainable results.
2. MedicalNarratives: A dataset of 4.7M image-text pairs from medical videos that aligns speech with mouse movements. Trained GenMedCLIP model achieves SOTA across 12 medical domains.
3. Quilt-LLaVA: Challenges in histopathology addressed by extracting localized narratives from open-source histopathology videos for visual instruction tuning.

Research Experience

I have been involved in several research projects, including PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology; MedicalNarratives: Connecting Medical Vision and Language with Localized Narratives; and Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos.

Education

I am a 4th year PhD student in Computer Science & Engineering at University of Washington, where I work with Prof. Ranjay Krishna and Linda Shapiro. Previously, I received my B.Sc. from Obafemi Awolowo University, where I was fortunate to work with Prof. Kayode P. Ayodele.

Background

Research Interest: My research aims to advance multimodal representation and generative modeling through effective alignment strategies. On the data side, I study large-scale data curation methods to connect various co-occurring or connected modalities (e.g., vision and language) across domains, creating multimodal datasets that enable large-scale training without expensive annotation. On the model side, I focus on multimodal reasoning, developing multi-agent frameworks and foundation models aligned to human experts, and improving image/video generative models through alignment to expected behavior, specifically with an interest in physics-informed approaches that improve temporal consistency and physical plausibility.

Co-authors

7 total