🤖 AI Summary
This study addresses the problem of systematic metadata discrepancies between OpenAlex and Web of Science (WoS) and their potential impact on bibliometric analyses. We systematically compare consistency across four critical metadata dimensions—document type, publication year, language, and author count—via cross-database citation matching, rigorous data cleaning, and multidimensional quantitative consistency assessment. Our method enables the first comprehensive, empirical evaluation of metadata quality differences between these two major scholarly databases. Key findings reveal distinct error patterns: OpenAlex exhibits significant overestimation of author counts and misclassification of document types, whereas WoS underrepresents non-English publications. Year misalignment and language mislabeling further compound inter-database inconsistencies. These results provide empirical evidence and methodological guidance for database selection, interpretation of bibliometric indicators, and metadata curation in research evaluation and science policy.
📝 Abstract
Bibliometrics, whether used for research or research evaluation, relies on large multidisciplinary databases of research outputs and citation indices. The Web of Science (WoS) was the main supporting infrastructure of the field for more than 30 years until several new competitors emerged. OpenAlex, a bibliographic database launched in 2022, has distinguished itself for its openness and extensive coverage. While OpenAlex may reduce or eliminate barriers to accessing bibliometric data, one of the concerns that hinders its broader adoption for research and research evaluation is the quality of its metadata. This study aims to assess metadata quality in OpenAlex and WoS, focusing on document type, publication year, language, and number of authors. By addressing discrepancies and misattributions in metadata, this research seeks to enhance awareness of data quality issues that could impact bibliometric research and evaluation outcomes.