Yunxin Li
Scholar

Yunxin Li

Google Scholar ID: U98QY0QAAAAJ
Harbin Institute of Technology (Shenzhen)
Multimodal ReasoningLarge ModelsAI Agents
Citations & Impact
All-time
Citations
583
 
H-index
13
 
i10-index
14
 
Publications
20
 
Co-authors
9
list available
Resume (English only)
Academic Achievements
  • - Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models (Survey, ArXiv, 2025)
  • - Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts (IEEE TPAMI, 2025)
  • - Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation (SIGGRAPH Asia, 2024)
  • - VideoVista: A Versatile Benchmark for Video Understanding and Reasoning (arXive, 2024)
  • - Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment (ACL 2024 Main Conference)
  • - VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context (ICML, 2024)
  • - LMEye: An Interactive Perception Network for Large Language Models (IEEE Transactions on Multimedia (TMM), 2024)
  • - A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation (LREC-COLING, 2024)
  • - Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs (arXive, 2023)
  • - A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering (Technical Paper, 2023)
  • - Training Multimedia Event Extraction With Generated Images and Captions (ACM on Multimedia (ACM MM), 2023)
  • - A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text (ACL 2023 Main Conference)
Research Experience
  • - HKUST Research Assistant (2025.03 - 2025.08)
  • - ByteDance Doubao (Seed) Team (2024.10 - 2025.02)
  • - Tencent AILab (2024.04 - 2024.08)
  • - Tencent PCG (2021.10 - 2022.06)
Education
  • Ph.D.: Harbin Institute of Technology, Shenzhen, Advisors: Prof. Baotian Hu, Prof. Yuxin Ding, Prof. Min Zhang; Master of Engineering: Harbin Institute of Technology, Shenzhen; Bachelor of Science: Harbin Institute of Technology.
Background
  • Research interests include multimodal collaborative reasoning, video understanding and generation, multimodal agents, and embodied intelligence. The long-term goal is to help humans with more capable artificial intelligence, dreaming of building an intelligent metaverse.
Miscellany
  • Long-term cooperation with Dr. Lin Ma (Meituan, Beijing), Prof. Wenhan Luo (HKUST), Dr. Longyue Wang (Alibaba Group), and Yuxiang Wu (University College London).