Cross-Source Reasoning-based Correction for Author Name Disambiguation

📅 2026-06-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Author name disambiguation in academic search is often hindered by cross-source inconsistencies and error propagation, while reliance on manual annotation incurs prohibitive costs. This work proposes CrossND, a novel framework that, for the first time, leverages cross-source inconsistency as a corrective signal to enable fully automated and highly robust disambiguation. CrossND integrates data cleaning, probabilistic soft logic reasoning, and test-time scaling into a chained refinement pipeline, eliminating the need for expert-labeled training data. Evaluated on real-world datasets, the method significantly outperforms 17 strong baselines, demonstrating the efficacy of cross-source reasoning in enhancing both accuracy and robustness in author name disambiguation.

📝 Abstract

Author name disambiguation is a critical challenge in academic search systems, often addressed through from-scratch and real-time disambiguation approaches. However, current algorithms remain vulnerable to cumulative errors of paper-author assignments and overlook inconsistent assignments across different sources. Resorting to expert annotation is resource-intensive. To this end, this paper explores a new perspective for author name disambiguation: cross-source correction by leveraging inconsistent assignments across sources. We propose CrossND, a full-stack framework that integrates data refinement, cross-source reasoning, and test-time scaling. First, a chain-of-refinement pipeline denoises author profiles and produces more accurate paper-author matching probabilities. Second, a supervised fine-tuning process incorporates these refined signals and a probabilistic soft logic-based cross-correction module to infer the assignments of which sources are incorrect. Third, test-time scaling further enhances the accuracy and robustness of the predictions. Experiments on real-world datasets indicate that CrossND consistently outperforms 17 baselines by leveraging cross-source reasoning without human intervention.

Problem

Research questions and friction points this paper is trying to address.

author name disambiguation

cross-source reasoning

paper-author assignment

data inconsistency

academic search

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-source reasoning

author name disambiguation

probabilistic soft logic