- 11Plus-Bench: Demystifying Multimodal LLM Spatial Reasoning with Cognitive-Inspired Analysis
- Visual Planning: Let's Think Only with Images
- Scaling and Beyond: Advancing Spatial Reasoning in MLLMs Requires New Recipes
- Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation
2025 Publications:
- Imagine while Reasoning in Space: Multimodal Visualization-of-Thought
- Large Language Models are Miscalibrated In-Context Learners
- Enriching Patent Claim Generation with European Patent Dataset
- Lost in Embeddings: Information Loss in Vision-Language Models
2024 Publications:
- TopViewRS: Vision-Language Models as Top-View Spatial Reasoners
- Semantic Map-based Generation of Navigation Instructions
2023 Publications:
- Generating Data for Symbolic Language with Large Language Models
- Binding Language Models in Symbolic Languages
2022 Publications:
- UnifiedSKG: Unifying and Multi-Ta
Research Experience
Current research focuses on:
- Evaluation with explanatory and predictive power
- Paradigm and methods for multimodal reasoning
- Potential downstream application scenarios of multimodal reasoning
Education
- PhD (Probationary) in Computation, Cognition and Language, Language Technology Lab, University of Cambridge, 2023 - present
- Master of Philosophy in Advanced Computer Science, University of Cambridge, 2022 - 2023, Graduation with Distinction
- Bachelor of Engineering in Automation, Xi'an Jiaotong University, 2018 - 2022, Graduate with Distinction (91.88/100), Minor in Fintech (China Construction Bank - XJTU Fintech Elite Class)
Background
Research interests include language grounding and multimodal reasoning (e.g., images, structural knowledge, etc.), especially spatial reasoning. Currently a third-year PhD student in Computation, Cognition and Language at the Language Technology Lab, University of Cambridge, supervised by Dr. Ivan Vulić, Prof. Anna Korhonen, and Prof. Serge Belongie.