Unveiling Transformer Perception by Exploring Input Manifolds

📅 2024-10-08

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work investigates how Transformer models perceive inputs by characterizing the geometric structure of semantic equivalence classes in input space. Method: We model each Transformer layer as a layer-wise diffeomorphic transformation on the input manifold and introduce a pullback feature decomposition method grounded in output-distance metrics, enabling unsupervised identification of input equivalence classes and cross-class navigation—without reliance on task labels. The approach integrates differential geometry, Jacobian analysis, and manifold learning to ensure local interpretability. Contribution/Results: Evaluated across multiple CV and NLP benchmarks, our framework demonstrates geometric consistency and semantic coherence of learned equivalence classes. It establishes the first task-agnostic, geometrically interpretable visualization framework for Transformer internal perception—providing principled insights into how Transformers structurally organize semantically equivalent inputs.

Technology Category

Application Category

📝 Abstract

This paper introduces a general method for the exploration of equivalence classes in the input space of Transformer models. The proposed approach is based on sound mathematical theory which describes the internal layers of a Transformer architecture as sequential deformations of the input manifold. Using eigendecomposition of the pullback of the distance metric defined on the output space through the Jacobian of the model, we are able to reconstruct equivalence classes in the input space and navigate across them. We illustrate how this method can be used as a powerful tool for investigating how a Transformer sees the input space, facilitating local and task-agnostic explainability in Computer Vision and Natural Language Processing tasks.

Problem

Research questions and friction points this paper is trying to address.

Explores equivalence classes in Transformer input space using manifold deformations

Identifies inputs producing identical or different class probability distributions

Projects retrieved instances into human-interpretable format for meaningful analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Explores input equivalence classes via manifold deformations

Uses eigendecomposition of Jacobian-pullback distance metrics

Retrieves instances with same or different class distributions

🔎 Similar Papers

Emergence of a High-Dimensional Abstraction Phase in Language Transformers

2024-05-24arXiv.orgCitations: 7

Authors to Follow