🤖 AI Summary
This study addresses the lack of interpretable temporal features for characterizing differences between human- and AI-generated language in the temporal distribution of semantic content. The authors propose a semantic timescale analysis framework based on autocorrelation windows (ACW), which transforms time-stamped text into semantic time series. Semantic fluctuations are quantified using WordNet depth (as a measure of semantic specificity) and SBERT embeddings (for contextual similarity). The validity of this approach is confirmed through word-order and duration-shuffling controls. Results show that segments with longer ACW-0 values are enriched with generic vocabulary, whereas shorter ACW-0 segments contain more specific terms—a pattern that markedly diminishes under shuffling conditions. These findings demonstrate that ACW effectively captures semantic temporal organization beyond static lexical distributions, offering a novel perspective for comparing human and AI language generation.
📝 Abstract
Spoken language, whether produced by humans or large language models (LLM), unfolds over time with varying semantic content. However, we still lack simple, interpretable time-series features that capture how generic versus specific content is distributed over time, and that can be used to compare human and AI-generated speech. We introduce a semantic-timescale analysis pipeline that turns word-level transcripts with timestamps into semantic time-series. For each spoken narrative, we compute (i) semantic specificity using WordNet-based word depth and (ii) contextual similarity using SBERT embeddings and quantify their temporal dependence using autocorrelation-window measures (ACW-0 and related metrics). We then compare original speech to multiple shuffled controls that selectively disrupt lexical identity, temporal order, and word duration. Across human-read autobiographical narratives, TTS readings, and LLM-generated texts rendered with TTS, we find that segments with longer ACW-0 in the semantic time-series tend to contain more generic vocabulary, whereas segments with shorter ACW-0 are enriched in more specific words. These associations are strongly attenuated or abolished when word order and timing are randomized, indicating that ACW-based measures capture non-trivial temporal organization of semantic content beyond static lexical distributions. Our results suggest that ACW-based semantic timescales are a useful family of features for analyzing and comparing the temporal structure of human and AI-generated speech.