Enhancing Zero-shot Commonsense Reasoning by Integrating Visual Knowledge via Machine Imagination

📅 2026-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the susceptibility of pretrained language models to human reporting bias in textual data, which leads to distorted commonsense reasoning in zero-shot settings. To mitigate this limitation, the authors propose a novel “machine imagination” mechanism that integrates an image generator into the reasoning pipeline. By synthesizing visual signals corresponding to input text, the framework constructs a synthetic visual question answering dataset and establishes an end-to-end multimodal zero-shot reasoning architecture named Imagine. This approach effectively compensates for gaps in textual knowledge and alleviates reporting bias. Experimental results demonstrate that Imagine significantly outperforms existing zero-shot methods across multiple commonsense reasoning benchmarks and even surpasses several advanced large language models, thereby validating the efficacy of machine imagination in enhancing model generalization.

Technology Category

Application Category

📝 Abstract
Recent advancements in zero-shot commonsense reasoning have empowered Pre-trained Language Models (PLMs) to acquire extensive commonsense knowledge without requiring task-specific fine-tuning. Despite this progress, these models frequently suffer from limitations caused by human reporting biases inherent in textual knowledge, leading to understanding discrepancies between machines and humans. To bridge this gap, we introduce an additional modality to enrich the reasoning capabilities of PLMs. We propose Imagine (Machine Imagination-based Reasoning), a novel zero-shot commonsense reasoning framework that supplements textual inputs with visual signals from machine-generated images. Specifically, we enhance PLMs with the ability to imagine by embedding an image generator directly into the reasoning pipeline. To facilitate effective utilization of this imagined visual context, we construct synthetic datasets designed to emulate visual question-answering scenarios. Through comprehensive evaluations on multiple commonsense reasoning benchmarks, we demonstrate that Imagine substantially outperforms existing zero-shot approaches and even surpasses advanced large language models. These results underscore the capability of machine imagination to mitigate reporting bias and significantly enhance the generalization ability of commonsense reasoning models
Problem

Research questions and friction points this paper is trying to address.

zero-shot commonsense reasoning
reporting bias
pre-trained language models
machine imagination
visual knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

machine imagination
zero-shot commonsense reasoning
visual knowledge integration
pre-trained language models
reporting bias mitigation
🔎 Similar Papers
No similar papers found.