“ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing” accepted by ICCV 2025 (equal contribution)
“Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate” accepted by ICCV 2025
“OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation” accepted by CVPR 2024 (Highlight, top 2.8%)
“PointCAT: Contrastive Adversarial Training for Robust Point Cloud Recognition” published in IEEE TIP 2024
“Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting” accepted by ICCV 2023
“Diversity-Aware Meta Visual Prompting” accepted by CVPR 2023
Contributed to multiple open-source projects and datasets, including ScaleCap (450k high-quality long image captions), MMRC (large-scale real-world MLLM conversation benchmark), Light-A-Video (training-free video relighting), and PyramidDrop (accelerating LVLM training/inference)