ForeCite: Adapting Pre-Trained Language Models to Predict Future Citation Rates of Academic Papers

📅 2025-05-13

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses automated assessment of scientific impact by proposing the first causal pre-trained language model framework tailored for long-horizon citation rate prediction. Methodologically, it builds upon a causal Transformer architecture augmented with a lightweight linear regression head, incorporates gradient saliency analysis, and employs strict temporal hold-out validation; it further introduces a simple yet robust fine-tuning paradigm for efficient temporal impact modeling. The key contribution is the first adaptation of causal language modeling to monthly average citation rate regression, achieving a Spearman correlation of ρ = 0.826 on a dataset of over 900,000 biomedical papers—surpassing the prior state-of-the-art by 27 percentage points. Scaling law analysis confirms consistent performance gains with increasing model size and data volume.

Technology Category

Application Category

📝 Abstract

Predicting the future citation rates of academic papers is an important step toward the automation of research evaluation and the acceleration of scientific progress. We present $ extbf{ForeCite}$, a simple but powerful framework to append pre-trained causal language models with a linear head for average monthly citation rate prediction. Adapting transformers for regression tasks, ForeCite achieves a test correlation of $ ho = 0.826$ on a curated dataset of 900K+ biomedical papers published between 2000 and 2024, a 27-point improvement over the previous state-of-the-art. Comprehensive scaling-law analysis reveals consistent gains across model sizes and data volumes, while temporal holdout experiments confirm practical robustness. Gradient-based saliency heatmaps suggest a potentially undue reliance on titles and abstract texts. These results establish a new state-of-the-art in forecasting the long-term influence of academic research and lay the groundwork for the automated, high-fidelity evaluation of scientific contributions.

Problem

Research questions and friction points this paper is trying to address.

Predict future citation rates of academic papers

Adapt pre-trained language models for citation prediction

Improve accuracy in forecasting research influence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapts pre-trained language models for citation prediction

Uses linear head for average monthly citation rate

Achieves high test correlation on biomedical papers

🔎 Similar Papers

No similar papers found.

Authors to Follow