Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning

📅 2024-04-19
📈 Citations: 9
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the hypothetical reasoning capabilities of multimodal large language models (MLLMs) under predefined perturbations and identifies a pervasive failure in compositional hypothetical reasoning. To address this, we introduce MARS-Bench—the first dedicated benchmark for evaluating hypothetical reasoning in MLLMs—and propose Active Deduction (AD), a novel reinforcement learning paradigm. AD explicitly guides models through stepwise, prompt-driven composite reasoning, multimodal instruction tuning, and predefined sensitivity modeling. Crucially, AD achieves the first simultaneous improvement in both hypothetical reasoning and general-purpose question answering (QA). On MARS-Bench, it boosts hypothetical reasoning accuracy by an average of 23.6% across 12 prominent open- and closed-source MLLMs, without degrading general QA performance. Furthermore, AD provides an interpretable framework for analyzing reasoning trajectories, enabling transparent diagnosis and validation of hypothetical inference processes.

Technology Category

Application Category

📝 Abstract
Recently, Multimodal Large Language Models (MLLMs) have achieved significant success across multiple disciplines due to their exceptional instruction-following capabilities and extensive world knowledge. However, whether these MLLMs possess human-like compositional reasoning abilities remains an open problem. To unveil their reasoning behaviors, we first curate a extbf{M}ultimodal extbf{A}ssumptive extbf{R}ea extbf{s}oning Benchmark (MARS-Bench) in this paper. Interestingly, we find that most prevalent MLLMs can be easily fooled by the introduction of a presupposition into the question, whereas such presuppositions appear naive to human reasoning. Besides, we also propose a simple yet effective method, Active Deduction (AD), a novel reinforcement learning paradigm to encourage the model to actively perform composite deduction before reaching a final decision. Equipped with the proposed AD method, a MLLM demonstrates significant improvements in assumptive reasoning abilities without compromising its general-purpose question-answering performance. We also provide extensive evaluations of both open-source and private MLLMs on MARS-Bench, along with experimental analyses of the AD method.
Problem

Research questions and friction points this paper is trying to address.

Assessing MLLMs' human-like compositional reasoning abilities
Addressing MLLMs' vulnerability to presuppositions in questions
Improving assumptive reasoning without general QA performance loss
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed MARS-Bench for multimodal assumptive reasoning
Introduced Active Deduction reinforcement learning method
Enhanced MLLMs' reasoning without performance loss
🔎 Similar Papers
No similar papers found.
Y
Yian Li
Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University, Shanghai, China
W
Wentao Tian
Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University, Shanghai, China
Y
Yang Jiao
Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University, Shanghai, China
Jingjing Chen
Jingjing Chen
Fudan University
MultimediaComputer VisionMachine LearningPattern recognition
Yu-Gang Jiang
Yu-Gang Jiang
Professor, Fudan University. IEEE & IAPR Fellow
Video AnalysisEmbodied AITrustworthy AI
Tianwen Qian
Tianwen Qian
East China Normal University
MultimediaVision and LanguageEmbodied AI
B
Bin Zhu
N
Na Zhao