🤖 AI Summary
This study addresses the challenge of quantifying causal relationships between public research funding—exemplified by the U.S. National Science Foundation (NSF)—and downstream scientific outcomes, to enhance transparency and rigor in funding evaluation. We introduce FIND, the first large-scale, structured, open database that systematically links NSF grant proposals with their resulting publications’ metadata, abstracts, and citation networks. Methodologically, we integrate natural language processing and large language models to automatically extract scientific claims from proposals and perform cross-text semantic alignment and citation impact prediction. Empirical results demonstrate that proposal-level textual features significantly predict subsequent paper citation counts; moreover, our approach enables scalable, automated measurement of “promise–delivery” consistency. FIND establishes foundational infrastructure and methodology for metascience, evidence-based funding policy design, and robust research impact assessment.
📝 Abstract
Public funding plays a central role in driving scientific discovery. To better understand the link between research inputs and outputs, we introduce FIND (Funding-Impact NSF Database), an open-access dataset that systematically links NSF grant proposals to their downstream research outputs, including publication metadata and abstracts. The primary contribution of this project is the creation of a large-scale, structured dataset that enables transparency, impact evaluation, and metascience research on the returns to public funding. To illustrate the potential of FIND, we present two proof-of-concept NLP applications. First, we analyze whether the language of grant proposals can predict the subsequent citation impact of funded research. Second, we leverage large language models to extract scientific claims from both proposals and resulting publications, allowing us to measure the extent to which funded projects deliver on their stated goals. Together, these applications highlight the utility of FIND for advancing metascience, informing funding policy, and enabling novel AI-driven analyses of the scientific process.