Average shortest-path length in word-adjacency networks: Chinese versus English

📅 2025-12-01
🏛️ Physical Review E
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates differences in the average shortest path length between Chinese and English word adjacency networks and their sensitivity to punctuation. For the first time, punctuation marks are incorporated as ordinary nodes in network construction. By integrating complex network modeling, Zipfian analysis, and growing network models, the research systematically examines the topological evolution of literary texts across multiple periods in both languages, including translations. The results reveal that when punctuation is included, Chinese and English networks exhibit similar asymptotic behavior in average shortest path length; however, excluding punctuation leads to a significantly larger value for Chinese. This finding underscores the critical role of punctuation in shaping the structural properties of language networks, with model predictions showing strong agreement with empirical data.

Technology Category

Application Category

📝 Abstract
Complex networks provide powerful tools for analyzing and understanding the intricate structures present in various systems, including natural language. Here, we analyze topology of growing word-adjacency networks constructed from Chinese and English literary works written in different periods. Unconventionally, instead of considering dictionary words only, we also include punctuation marks as if they were ordinary words. Our approach is based on two arguments: (1) punctuation carries genuine information related to emotional state, allows for logical grouping of content, provides a pause in reading, and facilitates understanding by avoiding ambiguity, and (2) our previous works have shown that punctuation marks behave like words in a Zipfian analysis and, if considered together with regular words, can improve authorship attribution in stylometric studies. We focus on a functional dependence of the average shortest path length L(N) on a network size N for different epochs and individual novels in their original language as well as for translations of selected novels into the other language. We approximate the empirical results with a growing network model and obtain satisfactory agreement between the two. We also observe that L(N) behaves asymptotically similar for both languages if punctuation marks are included but becomes sizably larger for Chinese if punctuation marks are neglected.
Problem

Research questions and friction points this paper is trying to address.

average shortest-path length
word-adjacency networks
Chinese versus English
punctuation marks
complex networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

word-adjacency networks
punctuation as nodes
average shortest-path length
cross-linguistic comparison
growing network model
🔎 Similar Papers
No similar papers found.
J
Jakub Dec
Faculty of Computer Science and Telecommunications, Cracow University of Technology, ul. Warszawska 25, 31-155 Kraków, Poland
M
Michał Dolina
Faculty of Computer Science and Telecommunications, Cracow University of Technology, ul. Warszawska 25, 31-155 Kraków, Poland
Stanisław Drożdż
Stanisław Drożdż
IFJ PAN Kraków and Cracow University of Technology
complex systemsfinancial marketsquantitative linguisticsnuclear physics
Jarosław Kwapień
Jarosław Kwapień
Institute of Nuclear Physics, Polish Academy of Sciences
nonlinear dynamicscomplex systemscomplex networkseconophysicsquantitative linguistics
J
Jin Liu
School of Modern Languages, Georgia Institute of Technology, Swann Building, 613 Cherry Street NW, Atlanta GA 30332-0375, USA
T
Tomasz Stanisz
Complex Systems Theory Department, Institute of Nuclear Physics, Polish Academy of Sciences, ul. Radzikowskiego 152, 31-342 Kraków, Poland