🤖 AI Summary
Predicting the success of early-stage startups is highly challenging due to data scarcity, and traditional machine learning approaches often underperform in such low-data regimes. This work proposes a training-free in-context learning (ICL) framework that innovatively integrates k-nearest neighbor retrieval with large language models to enable effective prediction. By dynamically selecting the most relevant annotated examples based on similarity matching using startup profile features, the method constructs informative prompts without requiring model fine-tuning. Evaluated on real-world Crunchbase data, the approach achieves significantly superior performance over conventional supervised learning and standard ICL baselines using only 50 labeled examples, demonstrating exceptional few-shot predictive capability.
📝 Abstract
Venture capital (VC) investments in early-stage startups that end up being successful can yield high returns. However, predicting early-stage startup success remains challenging due to data scarcity (e.g., many VC firms have information about only a few dozen of early-stage startups and whether they were successful). This limits the effectiveness of traditional machine learning methods that rely on large labeled datasets for model training. To address this challenge, we propose an in-context learning framework for startup success prediction using large language models (LLMs) that requires no model training and leverages only a small set of labeled startups as demonstration examples. Specifically, we propose a novel k-nearest-neighbor-based in-context learning framework, called kNN-ICL, which selects the most relevant past startups as examples based on similarity. Using real-world profiles from Crunchbase, we find that the kNN-ICL approach achieves higher prediction accuracy than supervised machine learning baselines and vanilla in-context learning. Further, we study how performance varies with the number of in-context examples and find that a high balanced accuracy can be achieved with as few as 50 examples. Together, we demonstrate that in-context learning can serve as a decision-making tool for VC firms operating in data-scarce environments.