Adaptive Gaussian Process Search for Simulation-Based Sample Size Estimation in Clinical Prediction Models: Validation of the pmsims R Package

📅 2026-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of efficient and flexible sample size calculation methods for clinical prediction model development across multiple outcome types—binary, continuous, and time-to-event—where existing approaches often rely on strong assumptions or focus solely on mean performance metrics. The authors propose an adaptive search framework based on Gaussian process surrogate modeling, integrating deterministic bisection with a hybrid optimization strategy to substantially reduce the number of required model fits. This approach achieves stable and accurate sample size estimates even in low-signal, high-dimensional settings. Implemented in the R package pmsims, the method consistently recommends sample sizes that closely align with prespecified performance targets across diverse outcome scenarios, demonstrating superior computational efficiency and stability compared to non-adaptive simulation and conventional analytical approaches.

Technology Category

Application Category

📝 Abstract
Background: Determining an adequate sample size is essential for developing reliable and generalisable clinical prediction models, yet practical guidance on selecting appropriate methods remains limited. Existing analytical and simulation-based approaches often rely on restrictive assumptions and focus on mean-based criteria. We present and validate pmsims, an R package that uses Gaussian process surrogate modelling to provide a flexible and computationally efficient simulation-based framework for sample size determination across diverse prediction settings. Methods: We conducted a comprehensive simulation study with two aims. First, we compared three search engines implemented in pmsims: a Gaussian process-based adaptive method, a deterministic bisection method, and a hybrid approach, across binary, continuous, and survival outcomes. Second, we benchmarked the best-performing pmsims engine against existing analytical (pmsampsize) and simulation-based (samplesizedev) methods, evaluating recommended sample sizes, computational time, and achieved performance on large independent validation datasets. Results: The Gaussian process-based method consistently produced the most stable sample size estimates, particularly in low-signal, high-dimensional settings. In benchmarking, pmsims achieved performance close to prespecified targets across all outcome types, matching simulation-based approaches and outperforming analytical methods in more challenging scenarios. Conclusions: pmsims provides an efficient and flexible framework for principled sample size planning in clinical prediction modelling, requiring fewer model evaluations than non-adaptive simulation approaches.
Problem

Research questions and friction points this paper is trying to address.

sample size estimation
clinical prediction models
simulation-based methods
Gaussian process
model development
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian process
adaptive search
sample size estimation
simulation-based
clinical prediction models
🔎 Similar Papers
No similar papers found.
O
Oyebayo Ridwan Olaniran
Department of Biostatistics and Health Informatics, King’s College London, London, United Kingdom; NIHR Biomedical Research Centre, Maudsley NHS Trust, London, United Kingdom
D
Diana Shamsutdinova
Department of Biostatistics and Health Informatics, King’s College London, London, United Kingdom; NIHR Biomedical Research Centre, Maudsley NHS Trust, London, United Kingdom
S
Sarah Markham
Department of Biostatistics and Health Informatics, King’s College London, London, United Kingdom
F
Felix Zimmer
Department of Biostatistics and Health Informatics, King’s College London, London, United Kingdom
Daniel Stahl
Daniel Stahl
Department of Biostatistics and Health Informatics, IoPPN, King's College London
StatisticsMachine/Statistical LearningPrediction modelingCausal modellingClinical Trials
G
Gordon Forbes
Department of Biostatistics and Health Informatics, King’s College London, London, United Kingdom; NIHR Biomedical Research Centre, Maudsley NHS Trust, London, United Kingdom
E
Ewan Carr
Department of Biostatistics and Health Informatics, King’s College London, London, United Kingdom