Towards Human Cognition: Visual Context Guides Syntactic Priming in Fusion-Encoded Models

📅 2025-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the syntactic modeling capabilities of multimodal language models (MLLMs), specifically examining how visual context induces syntactic priming effects. To this end, we introduce PRISMATIC—the first large-scale multimodal structural priming dataset—and propose a reference-free metric for quantifying syntactic priming strength. Through systematic comparison of dual-encoder versus fusion-encoder architectures, we make the first observation that robust positive correlation between visual similarity and syntactic priming strength emerges exclusively in fusion-encoder models—a pattern strongly aligned with cross-modal coupling mechanisms documented in human psycholinguistics; no such correlation is found in dual-encoder models. These findings indicate that fusion architectures achieve deeper syntactic–visual representational coupling, offering novel empirical evidence for structured cognitive modeling in MLLMs and establishing an interpretable, task-grounded evaluation paradigm for syntactic priming in multimodal settings.

Technology Category

Application Category

📝 Abstract
We introduced PRISMATIC, the first multimodal structural priming dataset, and proposed a reference-free evaluation metric that assesses priming effects without predefined target sentences. Using this metric, we constructed and tested models with different multimodal encoding architectures (dual encoder and fusion encoder) to investigate their structural preservation capabilities. Our findings show that models with both encoding methods demonstrate comparable syntactic priming effects. However, only fusion-encoded models exhibit robust positive correlations between priming effects and visual similarity, suggesting a cognitive process more aligned with human psycholinguistic patterns. This work provides new insights into evaluating and understanding how syntactic information is processed in multimodal language models.
Problem

Research questions and friction points this paper is trying to address.

Evaluate syntactic priming in multimodal models
Compare encoding architectures' structural preservation
Explore visual-syntactic alignment in cognitive processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal structural priming dataset
Reference-free evaluation metric
Fusion-encoded models analysis
🔎 Similar Papers
No similar papers found.