Turbulence-like 5/3 spectral scaling in contextual representations of language as a complex system

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

196K/year
🤖 AI Summary
This study investigates whether natural language exhibits cross-scale statistical regularities—particularly turbulence-like spectral scaling—in the embedding space of Transformer models. Treating text as a high-dimensional trajectory in embedding space, the authors quantify scale-dependent fluctuations of token sequences and analyze their power spectra. They report, for the first time, a robust 5/3 power-law spectrum in contextual representations, analogous to Kolmogorov’s turbulence spectrum, suggesting that semantic information is integrated across scales in a scale-free, self-similar manner. This phenomenon is consistently observed across multiple languages and in both human- and AI-generated texts, yet vanishes in static embeddings or shuffled sequences, underscoring the critical role of dynamic contextual structure in shaping these statistical properties.

Technology Category

Application Category

📝 Abstract
Natural language is a complex system that exhibits robust statistical regularities. Here, we represent text as a trajectory in a high-dimensional embedding space generated by transformer-based language models, and quantify scale-dependent fluctuations along the token sequence using an embedding-step signal. Across multiple languages and corpora, the resulting power spectrum exhibits a robust power law with an exponent close to $5/3$ over an extended frequency range. This scaling is observed consistently in contextual embeddings from both human-written and AI-generated text, but is absent in static word embeddings and is disrupted by randomization of token order. These results show that the observed scaling reflects multiscale, context-dependent organization rather than lexical statistics alone. By analogy with the Kolmogorov spectrum in turbulence, our findings suggest that semantic information is integrated in a scale-free, self-similar manner across linguistic scales, and provide a quantitative, model-agnostic benchmark for studying complex structure in language representations.
Problem

Research questions and friction points this paper is trying to address.

turbulence
5/3 scaling
language as a complex system
contextual representations
power spectrum
Innovation

Methods, ideas, or system contributions that make the work stand out.

5/3 spectral scaling
contextual embeddings
complex system
power law
language turbulence
🔎 Similar Papers