Scholar

Chaoyou Fu

Google Scholar ID: 4A1xYQwAAAAJ

Nanjing University

Multimodal LLMLLMBiometrics

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

4,869

H-index

i10-index

Publications

Co-authors

list available

Contact

GitHubOpen ↗

Publications

38 items

SpeechParaling-Bench: A Comprehensive Benchmark for Paralinguistic-Aware Speech Generation

2026

Cited

Tango: Taming Visual Signals for Efficient Video Large Language Models

2026

Cited

ActFER: Agentic Facial Expression Recognition via Active Tool-Augmented Visual Reasoning

2026

Cited

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

2026

Cited

Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?

2026

Cited

Benchmarking PhD-Level Coding in 3D Geometric Computer Vision

2026

Cited

VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding

2026

Cited

Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion

2026

Cited

Resume (English only)

Academic Achievements

Pioneered the VITA series of multimodal LLMs (VITA-1.0/-1.5, Long-VITA, VITA-Audio, VITA-VLA, VITA-E)
Developed the MME benchmark series (MME, Video-MME, MME-RealWorld) for multimodal LLM evaluation
Founded the Awesome-MLLM community
Serves as Associate Editor for Pattern Recognition, Area Chair for ICLR
Member of CSIG Youth Committee and Executive Committee of CCF-AI & CCF-CV
Awards: CAS President’s Special Award, IEEE Biometrics Council Best Doctoral Dissertation Award
WAIC Yunfan Award, Xiaomi Young Scholar - Technology Innovation Award
Beijing Outstanding PhD Dissertation, CAS Outstanding PhD Dissertation
CVPR 2023 Outstanding Reviewer
Published in top venues including NeurIPS, CVPR, TPAMI, and National Science Review
Open-source projects widely recognized (e.g., VITA-1.5 with 2k+ Stars, MLLM Survey with 10k+ Stars)

Background

Researcher, Assistant Professor, and PhD Supervisor at the School of Intelligent Science and Technology, Nanjing University
Selected for the China Association for Science and Technology's 'Young Talent Support Program'
Leading the Multimodal Intelligence Group (NJU-MiG) at Nanjing University
Research focuses on Multimodal Large Language Models (Multimodal LLMs) and Large Language Models (LLMs)
Over 5,600 citations on Google Scholar, with a single first-author paper exceeding 1,000 citations
Open-source projects have accumulated over 20,000 GitHub Stars

Co-authors

5 total

Institute of Automation, Chinese Academy of Sciences

Liang Wang

National Lab of Pattern Recognition