From Funding to Findings (FIND): An Open Database of NSF Awards and Research Outputs

📅 2025-10-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of quantifying causal relationships between public research funding—exemplified by the U.S. National Science Foundation (NSF)—and downstream scientific outcomes, to enhance transparency and rigor in funding evaluation. We introduce FIND, the first large-scale, structured, open database that systematically links NSF grant proposals with their resulting publications’ metadata, abstracts, and citation networks. Methodologically, we integrate natural language processing and large language models to automatically extract scientific claims from proposals and perform cross-text semantic alignment and citation impact prediction. Empirical results demonstrate that proposal-level textual features significantly predict subsequent paper citation counts; moreover, our approach enables scalable, automated measurement of “promise–delivery” consistency. FIND establishes foundational infrastructure and methodology for metascience, evidence-based funding policy design, and robust research impact assessment.

Technology Category

Application Category

📝 Abstract
Public funding plays a central role in driving scientific discovery. To better understand the link between research inputs and outputs, we introduce FIND (Funding-Impact NSF Database), an open-access dataset that systematically links NSF grant proposals to their downstream research outputs, including publication metadata and abstracts. The primary contribution of this project is the creation of a large-scale, structured dataset that enables transparency, impact evaluation, and metascience research on the returns to public funding. To illustrate the potential of FIND, we present two proof-of-concept NLP applications. First, we analyze whether the language of grant proposals can predict the subsequent citation impact of funded research. Second, we leverage large language models to extract scientific claims from both proposals and resulting publications, allowing us to measure the extent to which funded projects deliver on their stated goals. Together, these applications highlight the utility of FIND for advancing metascience, informing funding policy, and enabling novel AI-driven analyses of the scientific process.
Problem

Research questions and friction points this paper is trying to address.

Linking NSF grants to research outputs for transparency
Analyzing grant language to predict citation impact
Measuring project goal achievement using AI extraction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Links NSF grants to research outputs
Creates large-scale structured dataset
Uses NLP for citation and claim analysis
🔎 Similar Papers
No similar papers found.
K
Kazimier Smith
Massachusetts Institute of Technology, Cambridge, MA, USA
Yucheng Lu
Yucheng Lu
IT-Universitetet i København
Medical image analysisimage processingdeep learning
Q
Qiaochu Fan
New York University, New York, NY, USA