Scholar

Yi Liu（刘熠）

Google Scholar ID: gGPehK4AAAAJ

Honor Device Co., Ltd

Deep learningVideo Understanding

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

1,380

H-index

i10-index

Publications

Co-authors

list available

Contact

CVOpen ↗

Publications

14 items

Browse publications on Google Scholar (top-right) ↗

Resume (English only)

Academic Achievements

- Publications:
- MagicVL-2B: Empowering Vision-Language Models on Mobile Devices with Lightweight Visual Encoders via Curriculum Learning, arXiv, 2025 (AAAI 2026 under review, 1st author)
- MagicGen: A Universal Multimodal Data Synthesis Agent for Domain-Specific Vision-Language Model Tuning, arXiv, 2025 (In process, 1st corresponding)
- E-VRAG: Enhancing Long Video Understanding with Resource-Efficient Retrieval Augmented Generation, arXiv, 2025 (AAAI 2026 under review, 1st corresponding)
- VideoCap-R1: Enhancing MLLMs for Video Captioning via Structured Thinking, arXiv, 2025 (NeurIPS 2025 under review, 2nd corresponding)
- LvBench: A Benchmark for Long-form Video Understanding with Versatile Multi-modal Question Answering, International Journal of Computer Vision, 2025 (IJCV, CAS 1st, IF=9.3, co-first 3rd)
- MLLM-TA: Leveraging Multimodal Large Language Models for Precise Temporal Video Grounding, IEEE Signal Processing Letters, 2024 (SPL, CAS 2nd, IF=3.9, 1st author)
- MVBench: A Comprehensive Multi-modal Video Understanding Benchmark, Computer Vision and Pattern Recognition, 2024 (CVPR, CCF-A, 6th author)
- F2S-Net: Learning Frame-To-Segment Prediction for Online Action Detection, Journal of Real-Time Image Processing, 2024 (JRTIP, CAS 3rd, IF=3.0, 1st author)
- Dual masked modeling for weakly-supervised temporal boundary discovery, IEEE Transactions on Multimedia, 2023 (TMM, CAS 1st, IF=9.7, co-first 2nd)
- Learning Discriminative Feature Representation for Open Set Action Recognition, ACM International Conference on Multimedia, 2023 (ACM MM, CCF-A, co-first 2nd)
- InternVideo: General Video Foundation Models via Generative and Discriminative Learning, arXiv, 2022 (SCIS under review, 9th author)
- FineAction: A Fine-Grained Video Dataset for Temporal Action Localization, IEEE Transactions on Image Processing, 2022 (TIP, CAS 1st, IF=13.7, 1st author)
- VideoPipe 2022 Challenge: Real-World Video Understanding for Urban Pipe Inspection, International Conference on Pattern Recognition, 2022 (ICPR, CCF-C, 1st author)
- Short video scene online start detection task and method research, Integrated Technology, 2021 (co-first 2nd)
- Awards:
- 1st Prize in ECCV 2022 Ego4D Episodic Memory Challenge, Moments Queries Track
- 1st Prize in ECCV 2022 Ego4D Episodic Memory Challenge, Looking At Me Track

Research Experience

- Research intern at Shanghai AI Laboratory from 2022 to 2023

Education

- Ph.D. degree: University of Chinese Academy of Sciences (UCAS), MMLab@SIAT, supervised by Prof. Yu Qiao and Prof. Yali Wang, 2024
- B.Eng. degree: Huazhong University of Science and Technology (HUST), Wuhan, China, 2019

Background

- Research Interests: Vision-Language Models, Video Understanding
- About Me: Now I work at Honor Device Co., Ltd as the project leader (PL) of the On-device VLM Group, focusing on Vision-Language Models and Video Understanding.

Miscellany