Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning

📅 2024-04-19

📈 Citations: 9

✨ Influential: 0

🤖 AI Summary

This work investigates the hypothetical reasoning capabilities of multimodal large language models (MLLMs) under predefined perturbations and identifies a pervasive failure in compositional hypothetical reasoning. To address this, we introduce MARS-Bench—the first dedicated benchmark for evaluating hypothetical reasoning in MLLMs—and propose Active Deduction (AD), a novel reinforcement learning paradigm. AD explicitly guides models through stepwise, prompt-driven composite reasoning, multimodal instruction tuning, and predefined sensitivity modeling. Crucially, AD achieves the first simultaneous improvement in both hypothetical reasoning and general-purpose question answering (QA). On MARS-Bench, it boosts hypothetical reasoning accuracy by an average of 23.6% across 12 prominent open- and closed-source MLLMs, without degrading general QA performance. Furthermore, AD provides an interpretable framework for analyzing reasoning trajectories, enabling transparent diagnosis and validation of hypothetical inference processes.

Technology Category

Application Category

📝 Abstract

Recently, Multimodal Large Language Models (MLLMs) have achieved significant success across multiple disciplines due to their exceptional instruction-following capabilities and extensive world knowledge. However, whether these MLLMs possess human-like compositional reasoning abilities remains an open problem. To unveil their reasoning behaviors, we first curate a extbf{M}ultimodal extbf{A}ssumptive extbf{R}ea extbf{s}oning Benchmark (MARS-Bench) in this paper. Interestingly, we find that most prevalent MLLMs can be easily fooled by the introduction of a presupposition into the question, whereas such presuppositions appear naive to human reasoning. Besides, we also propose a simple yet effective method, Active Deduction (AD), a novel reinforcement learning paradigm to encourage the model to actively perform composite deduction before reaching a final decision. Equipped with the proposed AD method, a MLLM demonstrates significant improvements in assumptive reasoning abilities without compromising its general-purpose question-answering performance. We also provide extensive evaluations of both open-source and private MLLMs on MARS-Bench, along with experimental analyses of the AD method.

Problem

Research questions and friction points this paper is trying to address.

Assessing MLLMs' human-like compositional reasoning abilities

Addressing MLLMs' vulnerability to presuppositions in questions

Improving assumptive reasoning without general QA performance loss

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed MARS-Bench for multimodal assumptive reasoning

Introduced Active Deduction reinforcement learning method

Enhanced MLLMs' reasoning without performance loss

🔎 Similar Papers

No similar papers found.

Authors to Follow