Identifying Emerging Concepts in Large Corpora

πŸ“… 2025-02-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the early detection of emerging concepts in large-scale textual corpora. We propose a novel dynamic detection method based on temporal evolution of embedding-space heatmaps. Unlike conventional approaches relying on semantic drift or lexical frequency statistics, our method projects high-dimensional word embeddings into interpretable semantic heatmaps and explicitly models their spatiotemporal distributional shifts across time periods, enabling fine-grained and high-precision identification of conceptual emergence. Evaluations on U.S. Senate speech transcripts (1941–2015) demonstrate statistically significant improvements over state-of-the-art baselines. Further analysis reveals that minority-party senators exhibit higher propensity to introduce novel concepts, and concept emergence exhibits statistically significant associations with legislators’ racial, ethnic, and gender identities. The source code and trained models are publicly available.

Technology Category

Application Category

πŸ“ Abstract
We introduce a new method to identify emerging concepts in large text corpora. By analyzing changes in the heatmaps of the underlying embedding space, we are able to detect these concepts with high accuracy shortly after they originate, in turn outperforming common alternatives. We further demonstrate the utility of our approach by analyzing speeches in the U.S. Senate from 1941 to 2015. Our results suggest that the minority party is more active in introducing new concepts into the Senate discourse. We also identify specific concepts that closely correlate with the Senators' racial, ethnic, and gender identities. An implementation of our method is publicly available.
Problem

Research questions and friction points this paper is trying to address.

Detects emerging concepts in large text corpora.
Analyzes U.S. Senate speeches from 1941 to 2015.
Identifies concepts linked to Senators' identities.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing embedding space heatmap changes
Detecting emerging concepts with high accuracy
Publicly available method implementation
πŸ”Ž Similar Papers
No similar papers found.