Selecting Language Models for Social Science: Start Small, Start Open, and Validate

📅 2026-01-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the absence of systematic evaluation criteria for language model selection in social science research, particularly with respect to validity, reliability, reproducibility, and replicability. It proposes a framework centered on replicability as a core principle, advocating for the prioritization of small, open-source models. To overcome the limitations of conventional reliance on pre-deployment benchmarks, the approach introduces narrowly scoped, domain-specific benchmarking combined with post-hoc validation of computational workflows. The framework comprehensively evaluates models based on openness, scale, training data, architecture, and fine-tuning strategies, establishing the first dedicated language model selection protocol for social science. This significantly enhances the reproducibility and scientific credibility of research findings in the field.

Technology Category

Application Category

📝 Abstract
Currently, there are thousands of large pretrained language models (LLMs) available to social scientists. How do we select among them? Using validity, reliability, reproducibility, and replicability as guides, we explore the significance of: (1) model openness, (2) model footprint, (3) training data, and (4) model architectures and fine-tuning. While ex-ante tests of validity (i.e., benchmarks) are often privileged in these discussions, we argue that social scientists cannot altogether avoid validating computational measures (ex-post). Replicability, in particular, is a more pressing guide for selecting language models. Being able to reliably replicate a particular finding that entails the use of a language model necessitates reliably reproducing a task. To this end, we propose starting with smaller, open models, and constructing delimited benchmarks to demonstrate the validity of the entire computational pipeline.
Problem

Research questions and friction points this paper is trying to address.

language models
model selection
replicability
validity
social science
Innovation

Methods, ideas, or system contributions that make the work stand out.

replicability
open language models
ex-post validation
delimited benchmarks
computational reproducibility
🔎 Similar Papers
No similar papers found.