Sample Size Calculations for Developing Clinical Prediction Models: Overview and pmsims R package

📅 2026-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of insufficient sample sizes in clinical prediction model development, which often leads to overfitting and poor generalizability, while existing methods struggle to reliably estimate the minimum required sample size. The authors propose a simulation-based, safeguard-oriented sample size calculation framework that innovatively integrates learning curves, Gaussian process optimization, and user-defined performance criteria, enabling a model-agnostic design. Implemented in the open-source R package pmsims, this approach efficiently accommodates diverse modeling techniques and evaluation metrics while explicitly quantifying performance uncertainty. Case studies demonstrate that pmsims offers marked advantages over current tools in terms of flexibility, computational efficiency, and broad applicability.

Technology Category

Application Category

📝 Abstract
Background: Clinical prediction models are increasingly used to inform healthcare decisions, but determining the minimum sample size for their development remains a critical and unresolved challenge. Inadequate sample sizes can lead to overfitting, poor generalisability, and biased predictions. Existing approaches, such as heuristic rules, closed-form formulas, and simulation-based methods, vary in flexibility and accuracy, particularly for complex data structures and machine learning models. Methods: We review current methodologies for sample size estimation in prediction modelling and introduce a conceptual framework that distinguishes between mean-based and assurance-based criteria. Building on this, we propose a novel simulation-based approach that integrates learning curves, Gaussian Process optimisation, and assurance principles to identify sample sizes that achieve target performance with high probability. This approach is implemented in pmsims, an open-source, model-agnostic R package. Results: Through case studies, we demonstrate that sample size estimates vary substantially across methods, performance metrics, and modelling strategies. Compared to existing tools, pmsims provides flexible, efficient, and interpretable solutions that accommodate diverse models and user-defined metrics while explicitly accounting for variability in model performance. Conclusions: Our framework and software advance sample size methodology for clinical prediction modelling by combining flexibility with computational efficiency. Future work should extend these methods to hierarchical and multimodal data, incorporate fairness and stability metrics, and address challenges such as missing data and complex dependency structures.
Problem

Research questions and friction points this paper is trying to address.

sample size
clinical prediction models
overfitting
generalisability
prediction bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

simulation-based sample size
Gaussian Process optimisation
assurance-based criteria
learning curves
model-agnostic
🔎 Similar Papers
No similar papers found.
D
Diana Shamsutdinova
Department of Biostatistics and Health Informatics, King’s College London; NIHR Biomedical Research Centre, Maudsley NHS Trust
F
Felix Zimmer
Department of Biostatistics and Health Informatics, King’s College London
O
Oyebayo Ridwan Olaniran
Department of Biostatistics and Health Informatics, King’s College London; NIHR Biomedical Research Centre, Maudsley NHS Trust
S
Sarah Markham
Department of Biostatistics and Health Informatics, King’s College London
Daniel Stahl
Daniel Stahl
Department of Biostatistics and Health Informatics, IoPPN, King's College London
StatisticsMachine/Statistical LearningPrediction modelingCausal modellingClinical Trials
G
Gordon Forbes
Department of Biostatistics and Health Informatics, King’s College London; NIHR Biomedical Research Centre, Maudsley NHS Trust
E
Ewan Carr
Department of Biostatistics and Health Informatics, King’s College London