A decomposition of Fisher's information to inform sample size for developing or updating fair and precise clinical prediction models -- Part 3: continuous outcomes

📅 2025-07-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing sample size guidelines for continuous clinical prediction models focus on preventing overfitting and estimating population parameters, while neglecting prediction precision—particularly confidence interval width—and its fairness across subgroups, thereby limiting clinical utility. Method: We propose a novel Fisher information–based sample size determination framework that extends the Fisher information matrix to the subgroup level for the first time. It jointly models epistemic uncertainty (parameter estimation error) and aleatoric uncertainty (individual-level variability), enabling simultaneous optimization of prediction precision and subgroup fairness. Grounded in linear regression theory and the unit information matrix, the method supports sample size calculation on both real and synthetic data and naturally generalizes to prediction intervals. Results: The framework enables both retrospective assessment of existing data adequacy and prospective determination of the sample size required to achieve target prediction precision. It substantially enhances the clinical credibility and operational feasibility of prediction models.

Technology Category

Application Category

📝 Abstract
Clinical prediction models enable healthcare professionals to estimate individual outcomes using patient characteristics. Current sample size guidelines for developing or updating models with continuous outcomes aim to minimise overfitting and ensure accurate estimation of population-level parameters, but do not explicitly address the precision of predictions. This is a critical limitation, as wide confidence intervals around predictions can undermine clinical utility and fairness, particularly if precision varies across subgroups. We propose methodology for calculating the sample size required to ensure precise and fair predictions in models with continuous outcomes. Building on linear regression theory and the Fisher's unit information matrix, our approach calculates how sample size impacts the epistemic (model-based) uncertainty of predictions and allows researchers to either (i) evaluate whether an existing dataset is sufficiently large, or (ii) determine the sample size needed to target a particular confidence interval width around predictions. The method requires real or synthetic data representing the target population. To assess fairness,the approach can evaluate prediction precision across subgroups. Extensions to prediction intervals are included to additionally address aleatoric uncertainty. Our methodology provides a practical framework for examining required sample sizes when developing or updating prediction models with continuous outcomes, focusing on achieving precise and equitable predictions. It supports the development of more reliable and fair models, enhancing their clinical applicability and trustworthiness.
Problem

Research questions and friction points this paper is trying to address.

Ensuring precise predictions in clinical models with continuous outcomes
Addressing fairness by evaluating prediction precision across subgroups
Determining sample size to minimize epistemic and aleatoric uncertainty
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decompose Fisher's information for sample size
Evaluate prediction precision across subgroups
Extend to prediction intervals for uncertainty
🔎 Similar Papers
No similar papers found.
R
Rebecca Whittle
Department of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of Birmingham, Birmingham, UK; National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, UK
Richard D Riley
Richard D Riley
University of Birmingham, UK.
Meta-analysisprognosis researchrisk prediction
L
Lucinda Archer
Department of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of Birmingham, Birmingham, UK; National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, UK; Institute of Data and AI, University of Birmingham, UK
G
Gary S Collins
Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
A
Amardeep Legha
Department of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of Birmingham, Birmingham, UK; National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, UK
K
Kym IE Snell
Department of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of Birmingham, Birmingham, UK; National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, UK
J
Joie Ensor
Department of Applied Health Sciences, School of Health Sciences, College of Medicine and Health, University of Birmingham, Birmingham, UK; National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, UK