Adaptive Gaussian Process Search for Simulation-Based Sample Size Estimation in Clinical Prediction Models: Validation of the pmsims R Package

📅 2026-03-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the lack of efficient and flexible sample size calculation methods for clinical prediction model development across multiple outcome types—binary, continuous, and time-to-event—where existing approaches often rely on strong assumptions or focus solely on mean performance metrics. The authors propose an adaptive search framework based on Gaussian process surrogate modeling, integrating deterministic bisection with a hybrid optimization strategy to substantially reduce the number of required model fits. This approach achieves stable and accurate sample size estimates even in low-signal, high-dimensional settings. Implemented in the R package pmsims, the method consistently recommends sample sizes that closely align with prespecified performance targets across diverse outcome scenarios, demonstrating superior computational efficiency and stability compared to non-adaptive simulation and conventional analytical approaches.

Technology Category

Application Category

📝 Abstract

Background: Determining an adequate sample size is essential for developing reliable and generalisable clinical prediction models, yet practical guidance on selecting appropriate methods remains limited. Existing analytical and simulation-based approaches often rely on restrictive assumptions and focus on mean-based criteria. We present and validate pmsims, an R package that uses Gaussian process surrogate modelling to provide a flexible and computationally efficient simulation-based framework for sample size determination across diverse prediction settings. Methods: We conducted a comprehensive simulation study with two aims. First, we compared three search engines implemented in pmsims: a Gaussian process-based adaptive method, a deterministic bisection method, and a hybrid approach, across binary, continuous, and survival outcomes. Second, we benchmarked the best-performing pmsims engine against existing analytical (pmsampsize) and simulation-based (samplesizedev) methods, evaluating recommended sample sizes, computational time, and achieved performance on large independent validation datasets. Results: The Gaussian process-based method consistently produced the most stable sample size estimates, particularly in low-signal, high-dimensional settings. In benchmarking, pmsims achieved performance close to prespecified targets across all outcome types, matching simulation-based approaches and outperforming analytical methods in more challenging scenarios. Conclusions: pmsims provides an efficient and flexible framework for principled sample size planning in clinical prediction modelling, requiring fewer model evaluations than non-adaptive simulation approaches.

Problem

Research questions and friction points this paper is trying to address.

sample size estimation

clinical prediction models

simulation-based methods

Gaussian process

model development

Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian process

adaptive search

sample size estimation

simulation-based

clinical prediction models

🔎 Similar Papers

No similar papers found.

Authors to Follow