Published 'Polos: Multimodal Metric Learning from Human Feedback for Image Captioning' at CVPR 2024 (top 3.6%)
Published 'VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image Captions' at EMNLP 2025 (main conference)
Co-first author of 'DialMAT' at CVPR 2023 Embodied AI Workshop
Published 'DENEB' at ACCV 2024 and 'JaSPICE' at CoNLL 2023
Multiple publications at IEEE/RSJ IROS 2023, IEEE ICASSP 2022, ACCV 2022, and other international venues
Oral presentations at major Japanese academic conferences including MVA, IPSJ, JSAI, and the Symposium on Recognition and Understanding of Images
AtCoder highest rating: 1545; GitHub total stars: 577
Background
Research interests include multimodal learning, image captioning, automatic evaluation metrics, hallucination detection and editing, and integration of large language models with vision foundation models.
Currently a first-year Ph.D. student (D1) in the Department of Computer Science, Faculty of Science and Technology, Keio University.
Visiting Scholar at Carnegie Mellon University.
Recipient of the JSPS DC1 Fellowship.
Selected as a 'Super Creator' in the 2024 MITOU IT Program under mentor Yoichi Ochiai.