Sequence Graphs Realizations and Ambiguity in Language Models

📅 2020-03-04

🏛️ International Computing and Combinatorics Conference

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work investigates the ambiguity in sequence graph representations induced by the bag-of-words assumption in language models. Specifically, given a window size (w), directed/undirected structure, and edge multiplicities (weights), we study two fundamental questions: (i) realizability—whether a given sequence graph corresponds to at least one valid sequence; and (ii) enumerability—how many distinct sequences map to the same graph. We establish the first systematic theoretical framework for sequence graph realizability, introducing a three-level generalized model that jointly accounts for window size, edge directionality, and edge weights. We design exact dynamic programming algorithms for counting and enumerating preimages. We prove that even small windows (e.g., (w = 2)) induce exponential ambiguity—semantically divergent sentences share identical sequence graphs. Furthermore, we identify several core combinatorial problems whose computational complexity remains open. Our results demonstrate that bag-of-words compression fundamentally undermines representation uniqueness, posing intrinsic challenges to model interpretability and robustness.

Technology Category

Application Category

📝 Abstract

Several natural language models rely on an assumption modeling each word context as a bag of words. We study the combinatorial implications of such assumption for the corresponding word or sentences representations. In particular , we present theoretical results concerning the family of sequence graphs, for which realizations yield equivalent representations given this assumption. Several combinatorial problems are presented, depending on three levels of generalisation (window size, graph orientation, and weights), and whether some of these are NP-complete is left opened. Based on these results, we also establish different algorithms, including a dynamic programming formulation, to count and explicit the different realizations of a sequence graph. This allows us to show that the bag of words assumption can induce an important number of sentences to have the same representations, even for relatively short context window sizes.

Problem

Research questions and friction points this paper is trying to address.

Study realizability and ambiguity of sequence graphs in language models

Analyze combinatorial and algorithmic aspects of sequence graph realizations

Investigate polynomial and hardness results for various graph settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sequence graphs model word co-occurrence in windows

Polynomial algorithms for realizability at window size 2

Dynamic programming for enumeration in moderate sizes

🔎 Similar Papers

Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path