Publications: - YOLO-Count: Differentiable Object Counting for Text-to-Image Generation, ICCV 2025; - CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs, NeurIPS 2024, ECCV Workshop on Emergent Visual Abilities and Limits of Foundation Models; - Improving Language Understanding from Screenshots, Preprint 2024; - Language Models as Science Tutors, ICML 2024; - TokenCompose: Grounding Diffusion with Token-level Supervision, CVPR 2024; - OmniControlNet: Dual-stage Integration for Conditional Image Generation, CVPR 2024, Workshop in Generative Models for Computer Vision; Awards: Top Reviewer in NeurIPS 2024; Nominations: Nominated for Siebel Scholar 2025.
Research Experience
Joined UC Berkeley as a Ph.D. student on August 26, 2025; worked on multimodal pre-training, reasoning, and evaluation during his time at Princeton University.
Education
Ph.D. student at EECS, UC Berkeley, advised by Prof. Joseph Gonzalez, Prof. Trevor Darrell, and Prof. Ion Stoica; M.S.E. in Computer Science at Princeton University, advised by Prof. Danqi Chen; B.S. in Data Science from Halicioglu Data Science Institute (HDSI) and B.A. in Cognitive Science from CogSci Department at the University of California, San Diego (UCSD), advised by Prof. Zhuowen Tu and Prof. Zhiting Hu.
Background
Research Interests: grounded decision making, reasoning, and planning in large multimodal models. Field: Electrical Engineering and Computer Sciences (EECS).