🤖 AI Summary
This study investigates differences in the average shortest path length between Chinese and English word adjacency networks and their sensitivity to punctuation. For the first time, punctuation marks are incorporated as ordinary nodes in network construction. By integrating complex network modeling, Zipfian analysis, and growing network models, the research systematically examines the topological evolution of literary texts across multiple periods in both languages, including translations. The results reveal that when punctuation is included, Chinese and English networks exhibit similar asymptotic behavior in average shortest path length; however, excluding punctuation leads to a significantly larger value for Chinese. This finding underscores the critical role of punctuation in shaping the structural properties of language networks, with model predictions showing strong agreement with empirical data.
📝 Abstract
Complex networks provide powerful tools for analyzing and understanding the intricate structures present in various systems, including natural language. Here, we analyze topology of growing word-adjacency networks constructed from Chinese and English literary works written in different periods. Unconventionally, instead of considering dictionary words only, we also include punctuation marks as if they were ordinary words. Our approach is based on two arguments: (1) punctuation carries genuine information related to emotional state, allows for logical grouping of content, provides a pause in reading, and facilitates understanding by avoiding ambiguity, and (2) our previous works have shown that punctuation marks behave like words in a Zipfian analysis and, if considered together with regular words, can improve authorship attribution in stylometric studies. We focus on a functional dependence of the average shortest path length L(N) on a network size N for different epochs and individual novels in their original language as well as for translations of selected novels into the other language. We approximate the empirical results with a growing network model and obtain satisfactory agreement between the two. We also observe that L(N) behaves asymptotically similar for both languages if punctuation marks are included but becomes sizably larger for Chinese if punctuation marks are neglected.