- Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models (Survey, ArXiv, 2025)
- Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts (IEEE TPAMI, 2025)
- Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation (SIGGRAPH Asia, 2024)
- VideoVista: A Versatile Benchmark for Video Understanding and Reasoning (arXive, 2024)
- Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment (ACL 2024 Main Conference)
- VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context (ICML, 2024)
- LMEye: An Interactive Perception Network for Large Language Models (IEEE Transactions on Multimedia (TMM), 2024)
- A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation (LREC-COLING, 2024)
- Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs (arXive, 2023)
- A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering (Technical Paper, 2023)
- Training Multimedia Event Extraction With Generated Images and Captions (ACM on Multimedia (ACM MM), 2023)
- A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text (ACL 2023 Main Conference)
Research Experience
- HKUST Research Assistant (2025.03 - 2025.08)
- ByteDance Doubao (Seed) Team (2024.10 - 2025.02)
- Tencent AILab (2024.04 - 2024.08)
- Tencent PCG (2021.10 - 2022.06)
Education
Ph.D.: Harbin Institute of Technology, Shenzhen, Advisors: Prof. Baotian Hu, Prof. Yuxin Ding, Prof. Min Zhang; Master of Engineering: Harbin Institute of Technology, Shenzhen; Bachelor of Science: Harbin Institute of Technology.
Background
Research interests include multimodal collaborative reasoning, video understanding and generation, multimodal agents, and embodied intelligence. The long-term goal is to help humans with more capable artificial intelligence, dreaming of building an intelligent metaverse.
Miscellany
Long-term cooperation with Dr. Lin Ma (Meituan, Beijing), Prof. Wenhan Luo (HKUST), Dr. Longyue Wang (Alibaba Group), and Yuxiang Wu (University College London).