Team LA at SCIDOCA shared task 2025: Citation Discovery via relation-based zero-shot retrieval

📅 2025-06-23

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This paper addresses the challenge of “paragraph–citation matching” in academic writing—i.e., accurately identifying the most appropriate citation from a large pool of highly similar candidate papers for a given long paragraph. We propose a zero-shot citation discovery method comprising two stages: (1) explicit extraction of interpretable semantic relation features (e.g., method–task, model–data) from the paragraph to construct relation-aware vectors for coarse-grained retrieval; and (2) fine-grained re-ranking and verification of candidates using a large language model. By avoiding reliance on labeled training data, our approach mitigates ambiguity arising from paragraph length and candidate homogeneity. Experiments on the SCIDOCA 2025 dataset show that our method significantly improves Top-1 citation prediction accuracy—outperforming baselines by an average of 12.7%—and demonstrates strong cross-domain generalization capability.

Technology Category

Application Category

📝 Abstract

The Citation Discovery Shared Task focuses on predicting the correct citation from a given candidate pool for a given paragraph. The main challenges stem from the length of the abstract paragraphs and the high similarity among candidate abstracts, making it difficult to determine the exact paper to cite. To address this, we develop a system that first retrieves the top-k most similar abstracts based on extracted relational features from the given paragraph. From this subset, we leverage a Large Language Model (LLM) to accurately identify the most relevant citation. We evaluate our framework on the training dataset provided by the SCIDOCA 2025 organizers, demonstrating its effectiveness in citation prediction.

Problem

Research questions and friction points this paper is trying to address.

Predict correct citation from candidate pool

Address challenges of long abstract paragraphs

Handle high similarity among candidate abstracts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieves top-k abstracts via relational features

Uses LLM for precise citation identification

Evaluated on SCIDOCA 2025 dataset effectively

🔎 Similar Papers

No similar papers found.