Global Geometry Is Not Enough for Vision Representations

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work addresses the limitation of existing visual representation methods, which overly rely on global geometric structure and struggle to effectively model compositional relationships among elements. Through a systematic evaluation of 21 visual encoders, the study reveals—for the first time—that standard geometric metrics are nearly uncorrelated with compositional binding capacity. To address this gap, the authors propose functional sensitivity, measured via the input–output Jacobian matrix, as a complementary evaluation dimension. Integrating geometric statistics, Jacobian analysis, and theoretical derivation, they demonstrate that functional sensitivity reliably predicts compositional binding performance and elucidate its origin at the level of optimization objectives. This insight establishes a novel evaluation paradigm for representation learning that moves beyond conventional geometric assessments.

Technology Category

Application Category

📝 Abstract

A common assumption in representation learning is that globally well-distributed embeddings support robust and generalizable representations. This focus has shaped both training objectives and evaluation protocols, implicitly treating global geometry as a proxy for representational competence. While global geometry effectively encodes which elements are present, it is often insensitive to how they are composed. We investigate this limitation by testing the ability of geometric metrics to predict compositional binding across 21 vision encoders. We find that standard geometry-based statistics exhibit near-zero correlation with compositional binding. In contrast, functional sensitivity, as measured by the input-output Jacobian, reliably tracks this capability. We further provide an analytic account showing that this disparity arises from objective design, as existing losses explicitly constrain embedding geometry but leave the local input-output mapping unconstrained. These results suggest that global embedding geometry captures only a partial view of representational competence and establish functional sensitivity as a critical complementary axis for modeling composite structure.

Problem

Research questions and friction points this paper is trying to address.

representation learning

global geometry

compositional binding

vision encoders

functional sensitivity

Innovation

Methods, ideas, or system contributions that make the work stand out.

compositional binding

functional sensitivity

embedding geometry