🤖 AI Summary
This paper addresses the challenge of “paragraph–citation matching” in academic writing—i.e., accurately identifying the most appropriate citation from a large pool of highly similar candidate papers for a given long paragraph. We propose a zero-shot citation discovery method comprising two stages: (1) explicit extraction of interpretable semantic relation features (e.g., method–task, model–data) from the paragraph to construct relation-aware vectors for coarse-grained retrieval; and (2) fine-grained re-ranking and verification of candidates using a large language model. By avoiding reliance on labeled training data, our approach mitigates ambiguity arising from paragraph length and candidate homogeneity. Experiments on the SCIDOCA 2025 dataset show that our method significantly improves Top-1 citation prediction accuracy—outperforming baselines by an average of 12.7%—and demonstrates strong cross-domain generalization capability.
📝 Abstract
The Citation Discovery Shared Task focuses on predicting the correct citation from a given candidate pool for a given paragraph. The main challenges stem from the length of the abstract paragraphs and the high similarity among candidate abstracts, making it difficult to determine the exact paper to cite. To address this, we develop a system that first retrieves the top-k most similar abstracts based on extracted relational features from the given paragraph. From this subset, we leverage a Large Language Model (LLM) to accurately identify the most relevant citation. We evaluate our framework on the training dataset provided by the SCIDOCA 2025 organizers, demonstrating its effectiveness in citation prediction.