Improving Multimodal Reasoning via Worst Dimension Optimization

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing process reward models in multimodal reasoning typically employ heuristic, equal-weight rewards across constraint dimensions—such as visual grounding and logical consistency—which can compromise overall reasoning reliability when stronger dimensions mask deficiencies in weaker ones. To address this limitation, this work proposes a worst-dimension optimization strategy that integrates dimension-aware evaluation and dynamic weight adjustment within the process reward framework. By selectively reinforcing the weakest dimension along the reasoning path, the approach enhances robustness and accuracy under complex multimodal constraints. Experimental results demonstrate that this method effectively mitigates cascading failures caused by the breakdown of any single dimension, thereby preserving the integrity and reliability of the entire reasoning process.

📝 Abstract

Multimodal reasoning requires a path that retains integrity over a wide range of constraints, from visual grounding to logic consistency. However, the current Process Reward Models focus on heuristically defined rewards that equally weigh these factors, which may lead to the concealment of individual dimension failures by the dominating factors, without guaranteeing the validity of the reasoning process in general.

Problem

Research questions and friction points this paper is trying to address.

Multimodal reasoning

Process Reward Models

Worst Dimension Optimization

Visual grounding

Logic consistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Worst Dimension Optimization

Multimodal Reasoning

Process Reward Models