🤖 AI Summary
Emerging technology domains lack systematic mapping and classification, hindering comprehensive technology foresight and policy formulation. Method: This paper introduces Cosmos 1.0—the first multidimensional knowledge graph covering 23,544 technologies—built upon a synergistic modeling framework integrating *technology meta-classification*, *semantic embedding*, and *multidimensional assessment*. It proposes a hierarchical taxonomy (ET3/ET7), generates 100-dimensional semantic embeddings using a BERT variant, fuses heterogeneous data from Wikipedia, OpenAlex, and Google Scholar for cross-platform knowledge linking, and trains a supervised term classifier achieving 96.2% accuracy on the manually curated ET100 benchmark. Contribution/Results: We release the open-source ET23k dataset and four interpretable evaluation indices—Awareness, Generality, Deeptech, and Age—with a 0.89 correlation to expert assessments, enabling quantitative support for technological situational awareness, evolutionary analysis, and evidence-based policymaking.
📝 Abstract
This paper describes a novel methodology to map the universe of emerging technologies, utilising various source data that contain a rich diversity and breadth of contemporary knowledge to create a new dataset and multiple indices that provide new insights into these technologies. The Cosmos 1.0 dataset is a comprehensive collection of 23,544 technologies (ET23k) structured into a hierarchical model. Each technology is categorised into three meta clusters (ET3) and seven theme clusters (ET7) enhanced by 100-dimensional embedding vectors. Within the cosmos, we manually verify 100 emerging technologies called the ET100. This dataset is enriched with additional indices specifically developed to assess the landscape of emerging technologies, including the Technology Awareness Index, Generality Index, Deeptech, and Age of Tech Index. The dataset incorporates extensive metadata sourced from Wikipedia and linked data from third-party sources such as Crunchbase, Google Books, OpenAlex and Google Scholar, which are used to validate the relevance and accuracy of the constructed indices. Moreover, we trained a classifier to identify whether they are developed"technology"or technology-related"terms".