Selecting Language Models for Social Science: Start Small, Start Open, and Validate

📅 2026-01-16

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the absence of systematic evaluation criteria for language model selection in social science research, particularly with respect to validity, reliability, reproducibility, and replicability. It proposes a framework centered on replicability as a core principle, advocating for the prioritization of small, open-source models. To overcome the limitations of conventional reliance on pre-deployment benchmarks, the approach introduces narrowly scoped, domain-specific benchmarking combined with post-hoc validation of computational workflows. The framework comprehensively evaluates models based on openness, scale, training data, architecture, and fine-tuning strategies, establishing the first dedicated language model selection protocol for social science. This significantly enhances the reproducibility and scientific credibility of research findings in the field.

Technology Category

Application Category

📝 Abstract

Currently, there are thousands of large pretrained language models (LLMs) available to social scientists. How do we select among them? Using validity, reliability, reproducibility, and replicability as guides, we explore the significance of: (1) model openness, (2) model footprint, (3) training data, and (4) model architectures and fine-tuning. While ex-ante tests of validity (i.e., benchmarks) are often privileged in these discussions, we argue that social scientists cannot altogether avoid validating computational measures (ex-post). Replicability, in particular, is a more pressing guide for selecting language models. Being able to reliably replicate a particular finding that entails the use of a language model necessitates reliably reproducing a task. To this end, we propose starting with smaller, open models, and constructing delimited benchmarks to demonstrate the validity of the entire computational pipeline.

Problem

Research questions and friction points this paper is trying to address.

language models

model selection

replicability

validity

social science

Innovation

Methods, ideas, or system contributions that make the work stand out.

replicability

open language models

ex-post validation

delimited benchmarks

computational reproducibility

🔎 Similar Papers

No similar papers found.

Authors to Follow