Symbolic and Abstractive Reasoning with Complex Visual Queries

πŸ“… 2026-06-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limited symbolic and abstract reasoning capabilities of multimodal large language models when processing abstract visual content. To this end, it introduces Complex Visual Queries (CVQs)β€”a novel data type that systematically synthesizes diverse reasoning tasks by integrating large-scale multimodal knowledge graphs with first-order logic operators. The authors propose a two-stage training framework designed to enhance the model’s neuro-symbolic reasoning abilities. Experimental results demonstrate that the proposed approach significantly improves performance across 14 CVQ categories and exhibits strong generalization across tasks and scenarios. This study establishes a scalable paradigm for data synthesis and training in multimodal reasoning, offering a foundation for future advances in abstract visual understanding.
πŸ“ Abstract
Understanding and reasoning over abstract visual content remains a challenge for current multi-modal large language models (MLLMs). In this paper, we explore a novel abstract data type termed complex visual query (CVQ), designed to probe symbolic and abstractive reasoning, which is a critical yet underexplored dimension of human-like neuro-symbolic reasoning for MLLMs. We present a comprehensive investigation from three perspectives: \textbf{Data $\times$ Paradigm $\times$ Exploration}. Specifically, we propose a scalable pipeline for synthesizing CVQs grounded in large-scale multi-modal knowledge graphs, generating a diverse dataset encompassing 14 distinct query types via systematic combinations of first-order logic operators. We further introduce a two-stage training framework that progressively equips MLLMs with robust visual reasoning capabilities. We conduct extensive experiments to rigorously evaluate MLLMs across multiple dimensions, including reasoning performance on CVQs, as well as cross-task and cross-scenario generalization. We believe our work opens new perspectives and avenues for advancing the reasoning frontiers of MLLMs.
Problem

Research questions and friction points this paper is trying to address.

symbolic reasoning
abstractive reasoning
complex visual query
multi-modal large language models
visual reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

complex visual query
symbolic reasoning
abstractive reasoning
multi-modal large language models
neuro-symbolic reasoning
πŸ”Ž Similar Papers
2024-09-12arXiv.orgCitations: 1