Wisdom Ikezogwo
Scholar

Wisdom Ikezogwo

Google Scholar ID: gt5I_iYAAAAJ
PhD Student, University of Washington
Multimodal UnderstandingVisionRepresentation learningMedical Image Analysis.
Citations & Impact
All-time
Citations
366
 
H-index
5
 
i10-index
4
 
Publications
12
 
Co-authors
7
list available
Resume (English only)
Academic Achievements
  • 1. PathFinder: A four-agent AI system for histopathology that outperforms both SOTA methods (+8%) and human pathologists (+9%) in melanoma diagnosis while providing explainable results.
  • 2. MedicalNarratives: A dataset of 4.7M image-text pairs from medical videos that aligns speech with mouse movements. Trained GenMedCLIP model achieves SOTA across 12 medical domains.
  • 3. Quilt-LLaVA: Challenges in histopathology addressed by extracting localized narratives from open-source histopathology videos for visual instruction tuning.
Research Experience
  • I have been involved in several research projects, including PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology; MedicalNarratives: Connecting Medical Vision and Language with Localized Narratives; and Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos.
Education
  • I am a 4th year PhD student in Computer Science & Engineering at University of Washington, where I work with Prof. Ranjay Krishna and Linda Shapiro. Previously, I received my B.Sc. from Obafemi Awolowo University, where I was fortunate to work with Prof. Kayode P. Ayodele.
Background
  • Research Interest: My research aims to advance multimodal representation and generative modeling through effective alignment strategies. On the data side, I study large-scale data curation methods to connect various co-occurring or connected modalities (e.g., vision and language) across domains, creating multimodal datasets that enable large-scale training without expensive annotation. On the model side, I focus on multimodal reasoning, developing multi-agent frameworks and foundation models aligned to human experts, and improving image/video generative models through alignment to expected behavior, specifically with an interest in physics-informed approaches that improve temporal consistency and physical plausibility.