How do Transformer Embeddings Represent Compositions? A Functional Analysis

📅 2025-06-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether mainstream Transformer embedding models (Mistral, OpenAI Large, Google Embedding, BERT) exhibit compositional representation for compound words. Method: We construct a transparent adjective–noun compounding dataset and systematically evaluate six composition functions—including ridge regression, vector addition, multiplication, and dilation—to enable the first cross-model quantitative assessment of compositionality. Contribution/Results: (1) Modern models (e.g., Mistral) demonstrate significantly stronger compositionality than BERT; (2) Linear vector addition achieves performance comparable to the optimal nonlinear method (ridge regression), challenging the prevailing assumption that complex composition mechanisms are inherently superior; (3) Functional analysis and embedding visualizations further confirm structural advantages of contemporary models in semantic composition. This work establishes an empirical benchmark and offers theoretical insights for interpretable embedding design and semantic compositional modeling.

Technology Category

Application Category

📝 Abstract
Compositionality is a key aspect of human intelligence, essential for reasoning and generalization. While transformer-based models have become the de facto standard for many language modeling tasks, little is known about how they represent compound words, and whether these representations are compositional. In this study, we test compositionality in Mistral, OpenAI Large, and Google embedding models, and compare them with BERT. First, we evaluate compositionality in the representations by examining six diverse models of compositionality (addition, multiplication, dilation, regression, etc.). We find that ridge regression, albeit linear, best accounts for compositionality. Surprisingly, we find that the classic vector addition model performs almost as well as any other model. Next, we verify that most embedding models are highly compositional, while BERT shows much poorer compositionality. We verify and visualize our findings with a synthetic dataset consisting of fully transparent adjective-noun compositions. Overall, we present a thorough investigation of compositionality.
Problem

Research questions and friction points this paper is trying to address.

How transformer embeddings represent compound words compositionally
Comparing compositionality in Mistral, OpenAI Large, Google models vs BERT
Evaluating six models of compositionality in embedding representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates six diverse compositionality models
Uses ridge regression for compositionality analysis
Compares embedding models with synthetic dataset
🔎 Similar Papers
No similar papers found.
Aishik Nagar
Aishik Nagar
Machine Learning Engineer, ASUS Intelligent Cloud Services (AICS)
AI for clinical careAI for healthcareCognitive AIEmbodied AIMultimodal AI
I
Ishaan Singh Rawal
Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR); Center for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A*STAR); Texas A&M University
M
Mansi Dhanania
Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR); Center for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A*STAR); McGill University
Cheston Tan
Cheston Tan
Institute for Infocomm Research; Centre for Frontier AI Research
Cognitively-Inspired AIEmbodied AIAGIHuman-Centric SystemsAssistive AI