Unveiling Transformer Perception by Exploring Input Manifolds

📅 2024-10-08
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates how Transformer models perceive inputs by characterizing the geometric structure of semantic equivalence classes in input space. Method: We model each Transformer layer as a layer-wise diffeomorphic transformation on the input manifold and introduce a pullback feature decomposition method grounded in output-distance metrics, enabling unsupervised identification of input equivalence classes and cross-class navigation—without reliance on task labels. The approach integrates differential geometry, Jacobian analysis, and manifold learning to ensure local interpretability. Contribution/Results: Evaluated across multiple CV and NLP benchmarks, our framework demonstrates geometric consistency and semantic coherence of learned equivalence classes. It establishes the first task-agnostic, geometrically interpretable visualization framework for Transformer internal perception—providing principled insights into how Transformers structurally organize semantically equivalent inputs.

Technology Category

Application Category

📝 Abstract
This paper introduces a general method for the exploration of equivalence classes in the input space of Transformer models. The proposed approach is based on sound mathematical theory which describes the internal layers of a Transformer architecture as sequential deformations of the input manifold. Using eigendecomposition of the pullback of the distance metric defined on the output space through the Jacobian of the model, we are able to reconstruct equivalence classes in the input space and navigate across them. We illustrate how this method can be used as a powerful tool for investigating how a Transformer sees the input space, facilitating local and task-agnostic explainability in Computer Vision and Natural Language Processing tasks.
Problem

Research questions and friction points this paper is trying to address.

Explores equivalence classes in Transformer input space using manifold deformations
Identifies inputs producing identical or different class probability distributions
Projects retrieved instances into human-interpretable format for meaningful analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Explores input equivalence classes via manifold deformations
Uses eigendecomposition of Jacobian-pullback distance metrics
Retrieves instances with same or different class distributions
🔎 Similar Papers
A
A. Benfenati
Department of Environmental Science and Policy, Università degli Studi di Milano, Milano, Italy
Alfio Ferrara
Alfio Ferrara
Dipartimento di Informatica, Università degli Studi di Milano
data sciencenatural language processingdigital humanities
A
A. Marta
Dipartimento di Scienze del Sistema Nervoso e del Comportamento, Università di Pavia, Pavia, Italy
D
Davide Riva
Department of Control and Computer Engineering, Politecnico di Torino, Torino, Italy
E
Elisabetta Rocchetti
Department of Computer Science, Università degli Studi di Milano, Milano, Italy