Anchored, Not Graded: Vision-Language Models Fail at Slant-from-Texture Perception

📅 2026-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether vision-language models (VLMs) exhibit human-like systematic biases in perceiving surface tilt angles. Employing psychophysical experimental paradigms alongside zero-shot and contextual prompting strategies, the authors evaluate geometric perception across diverse VLMs and model scales, complemented by supervised fine-tuning analyses. The work reports the first evidence of a pronounced anchoring effect in VLMs on low-level geometric tasks: models predominantly output predictions restricted to a few fixed angles (e.g., 0°, ±25°, ±45°), showing minimal sensitivity to continuous variations in field of view, optical slant, or surface curvature. While fine-tuning partially mitigates this bias, it remains persistent. These findings reveal inherent limitations in the interface between visual representations and linguistic output in VLMs, offering new insights into their geometric reasoning capabilities.
📝 Abstract
Human perception of surface slant from texture exhibits systematic, graded biases that emerge reliably in psychophysical experiments. Prior work showed that unsupervised CNNs reproduce several human-like biases, while supervised CNNs do not. Do Vision-Language Models (VLMs) exhibit similar competences? Across multiple VLM families and model scales, zero-shot and in-context prompting both produce distinctive failures: slant is predicted at only a small set of anchors (e.g., 0\degree, $\pm$25\degree, $\pm$45\degree) with little dependence on stimulus field of view, optical slant, or surface curvature. Supervised fine-tuning partially remediates the failure, but residual anchoring persists. While success in high-level vision-language benchmarks might not require sensitivity to low-level geometric cues, we interpret anchoring as a failure at the representation-to-output language interface: Not necessarily an absence of geometric encoding, but a failure to express it in a graded form.
Problem

Research questions and friction points this paper is trying to address.

slant-from-texture
Vision-Language Models
perceptual bias
anchoring
graded perception
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language Models
slant-from-texture
anchoring bias
graded perception
geometric representation