Turbulence-like 5/3 spectral scaling in contextual representations of language as a complex system

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether natural language exhibits cross-scale statistical regularities—particularly turbulence-like spectral scaling—in the embedding space of Transformer models. Treating text as a high-dimensional trajectory in embedding space, the authors quantify scale-dependent fluctuations of token sequences and analyze their power spectra. They report, for the first time, a robust 5/3 power-law spectrum in contextual representations, analogous to Kolmogorov’s turbulence spectrum, suggesting that semantic information is integrated across scales in a scale-free, self-similar manner. This phenomenon is consistently observed across multiple languages and in both human- and AI-generated texts, yet vanishes in static embeddings or shuffled sequences, underscoring the critical role of dynamic contextual structure in shaping these statistical properties.
📝 Abstract
Natural language is a complex system that exhibits robust statistical regularities. Here, we represent text as a trajectory in a high-dimensional embedding space generated by transformer-based language models, and quantify scale-dependent fluctuations along the token sequence using an embedding-step signal. Across multiple languages and corpora, the resulting power spectrum exhibits a robust power law with an exponent close to $5/3$ over an extended frequency range. This scaling is observed consistently in contextual embeddings from both human-written and AI-generated text, but is absent in static word embeddings and is disrupted by randomization of token order. These results show that the observed scaling reflects multiscale, context-dependent organization rather than lexical statistics alone. By analogy with the Kolmogorov spectrum in turbulence, our findings suggest that semantic information is integrated in a scale-free, self-similar manner across linguistic scales, and provide a quantitative, model-agnostic benchmark for studying complex structure in language representations.
Problem

Research questions and friction points this paper is trying to address.

turbulence
5/3 scaling
language as a complex system
contextual representations
power spectrum
Innovation

Methods, ideas, or system contributions that make the work stand out.

5/3 spectral scaling
contextual embeddings
complex system
power law
language turbulence
🔎 Similar Papers
Z
Zhongxin Yang
College of Engineering, Peking University, Beijing, 100871, China
C
Chun Bao
College of Engineering, Peking University, Beijing, 100871, China
Y
Yuanwei Bin
Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, 315200, Zhejiang, China; Shenzhen Tenfong Technology Co., Ltd., Shenzhen 518000, China
Xiang I. A. Yang
Xiang I. A. Yang
Pennsylvania State University, Stanford University, Johns Hopkins University, Peking University
TurbulenceTurbulent FlowsMachine LearningTurbulence Modeling
Shiyi Chen
Shiyi Chen
Professor, College of Engineering, EIT and SUSTech
fluid mechanicsturbulenceComputational fluid dynamicslattice Boltzmann