Team LA at SCIDOCA shared task 2025: Citation Discovery via relation-based zero-shot retrieval

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of “paragraph–citation matching” in academic writing—i.e., accurately identifying the most appropriate citation from a large pool of highly similar candidate papers for a given long paragraph. We propose a zero-shot citation discovery method comprising two stages: (1) explicit extraction of interpretable semantic relation features (e.g., method–task, model–data) from the paragraph to construct relation-aware vectors for coarse-grained retrieval; and (2) fine-grained re-ranking and verification of candidates using a large language model. By avoiding reliance on labeled training data, our approach mitigates ambiguity arising from paragraph length and candidate homogeneity. Experiments on the SCIDOCA 2025 dataset show that our method significantly improves Top-1 citation prediction accuracy—outperforming baselines by an average of 12.7%—and demonstrates strong cross-domain generalization capability.

Technology Category

Application Category

📝 Abstract
The Citation Discovery Shared Task focuses on predicting the correct citation from a given candidate pool for a given paragraph. The main challenges stem from the length of the abstract paragraphs and the high similarity among candidate abstracts, making it difficult to determine the exact paper to cite. To address this, we develop a system that first retrieves the top-k most similar abstracts based on extracted relational features from the given paragraph. From this subset, we leverage a Large Language Model (LLM) to accurately identify the most relevant citation. We evaluate our framework on the training dataset provided by the SCIDOCA 2025 organizers, demonstrating its effectiveness in citation prediction.
Problem

Research questions and friction points this paper is trying to address.

Predict correct citation from candidate pool
Address challenges of long abstract paragraphs
Handle high similarity among candidate abstracts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieves top-k abstracts via relational features
Uses LLM for precise citation identification
Evaluated on SCIDOCA 2025 dataset effectively
🔎 Similar Papers
No similar papers found.
T
Trieu An
Japan Advanced Institute of Science and Technology, Japan
Long Nguyen
Long Nguyen
Graduate Student, Carnegie Mellon University
biological and biomedical sciencesdigital pathologycomputational microscopy
M
Minh Le Nguyen
Japan Advanced Institute of Science and Technology, Japan