🤖 AI Summary
This work investigates how Transformer models perceive inputs by characterizing the geometric structure of semantic equivalence classes in input space. Method: We model each Transformer layer as a layer-wise diffeomorphic transformation on the input manifold and introduce a pullback feature decomposition method grounded in output-distance metrics, enabling unsupervised identification of input equivalence classes and cross-class navigation—without reliance on task labels. The approach integrates differential geometry, Jacobian analysis, and manifold learning to ensure local interpretability. Contribution/Results: Evaluated across multiple CV and NLP benchmarks, our framework demonstrates geometric consistency and semantic coherence of learned equivalence classes. It establishes the first task-agnostic, geometrically interpretable visualization framework for Transformer internal perception—providing principled insights into how Transformers structurally organize semantically equivalent inputs.
📝 Abstract
This paper introduces a general method for the exploration of equivalence classes in the input space of Transformer models. The proposed approach is based on sound mathematical theory which describes the internal layers of a Transformer architecture as sequential deformations of the input manifold. Using eigendecomposition of the pullback of the distance metric defined on the output space through the Jacobian of the model, we are able to reconstruct equivalence classes in the input space and navigate across them. We illustrate how this method can be used as a powerful tool for investigating how a Transformer sees the input space, facilitating local and task-agnostic explainability in Computer Vision and Natural Language Processing tasks.