Bridging Compositional and Distributional Semantics: A Survey on Latent Semantic Geometry via AutoEncoder

πŸ“… 2025-06-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the gap between symbolic and distributional semantics to enhance interpretability, controllability, compositional generalization, and robustness in autoregressive Transformer language models. Method: We propose a unified learning framework integrating compositional semantics with distributed representations, built upon variational autoencoders (VAEs), vector-quantized VAEs (VQ-VAEs), and sparse autoencoders (SAEs). Systematically investigating how latent-space geometry affects semantic compositionality and interpretability, we model structured geometric properties of latent variables from a compositional-semantic perspective, uncovering mappings between latent-space topology and linguistic meaning. Contribution/Results: Experiments demonstrate that structured latent representations significantly improve semantic disentanglement and controllable generation. Our approach provides both theoretical foundations and architectural blueprints for next-generation semantic models that reconcile symbolic rigor with distributional robustness.

Technology Category

Application Category

πŸ“ Abstract
Integrating compositional and symbolic properties into current distributional semantic spaces can enhance the interpretability, controllability, compositionality, and generalisation capabilities of Transformer-based auto-regressive language models (LMs). In this survey, we offer a novel perspective on latent space geometry through the lens of compositional semantics, a direction we refer to as extit{semantic representation learning}. This direction enables a bridge between symbolic and distributional semantics, helping to mitigate the gap between them. We review and compare three mainstream autoencoder architectures-Variational AutoEncoder (VAE), Vector Quantised VAE (VQVAE), and Sparse AutoEncoder (SAE)-and examine the distinctive latent geometries they induce in relation to semantic structure and interpretability.
Problem

Research questions and friction points this paper is trying to address.

Bridging compositional and distributional semantics in language models
Exploring latent space geometry via autoencoder architectures
Enhancing interpretability and controllability of semantic representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates compositional and symbolic semantics
Surveys VAE, VQVAE, SAE architectures
Bridges symbolic and distributional semantics
πŸ”Ž Similar Papers
No similar papers found.
Yingji Zhang
Yingji Zhang
University of Manchester
Computational LinguisticsRepresentation LearningDisentanglementMulti-modal Learning
Danilo S. Carvalho
Danilo S. Carvalho
University of Manchester
Artificial IntelligenceNatural Language Processing
A
AndrΓ© Freitas
Department of Computer Science, University of Manchester, UK; Idiap Research Institute, Switzerland; Cancer Biomarker Centre, CRUK Manchester Institute, UK