🤖 AI Summary
This study addresses the prediction of unexpectedly emerging interdisciplinary research directions—a critical challenge in science-of-science. Method: We introduce FOS, the first large-scale temporal graph benchmark for scientific frontier forecasting (1827–2024), comprising 65,027 subfields and their annual co-occurrence relationships, with a focus on predicting “first-time interdisciplinary associations.” Our approach employs a timestamped edge-based temporal graph modeling framework that jointly encodes long-text semantic embeddings of nodes and temporal topological features of edges, integrated with state-of-the-art temporal graph neural networks and dynamic negative sampling. Contribution/Results: Experiments demonstrate that semantic embeddings substantially improve prediction accuracy; ensemble models exhibit complementary strengths; and top-ranked predictions strongly align with subsequently observed real-world interdisciplinary developments. This work establishes a novel, reproducible paradigm and benchmark for scientific frontier detection.
📝 Abstract
Interdisciplinary scientific breakthroughs mostly emerge unexpectedly, and forecasting the formation of novel research fields remains a major challenge. We introduce FOS (Future Of Science), a comprehensive time-aware graph-based benchmark that reconstructs annual co-occurrence graphs of 65,027 research sub-fields (spanning 19 general domains) over the period 1827-2024. In these graphs, edges denote the co-occurrence of two fields in a single publication and are timestamped with the corresponding publication year. Nodes are enriched with semantic embeddings, and edges are characterized by temporal and topological descriptors. We formulate the prediction of new field-pair linkages as a temporal link-prediction task, emphasizing the "first-time" connections that signify pioneering interdisciplinary directions. Through extensive experiments, we evaluate a suite of state-of-the-art temporal graph architectures under multiple negative-sampling regimes and show that (i) embedding long-form textual descriptions of fields significantly boosts prediction accuracy, and (ii) distinct model classes excel under different evaluation settings. Case analyses show that top-ranked link predictions on FOS align with field pairings that emerge in subsequent years of academic publications. We publicly release FOS, along with its temporal data splits and evaluation code, to establish a reproducible benchmark for advancing research in predicting scientific frontiers.