🤖 AI Summary
Measurement error in covariates is pervasive in epidemiology and can induce substantial bias in estimated exposure–outcome relationships, particularly when these associations are nonlinear; yet systematic strategies for correction remain limited. This study presents the first comprehensive evaluation—via blinded, multi-stage simulations—of the performance of six correction methods (pointwise and coefficient-level SIMEX, Bayesian inference, multiple imputation, and regression calibration) combined with four flexible modeling techniques (B-splines, penalized splines, fractional polynomials, and natural splines). Results demonstrate that pointwise SIMEX yields the most accurate and robust estimates overall, while penalized splines, fractional polynomials, and natural splines perform comparably and outperform B-splines. No single approach consistently dominates across all scenarios, underscoring the necessity of conducting sensitivity analyses to account for uncertainty in both measurement error correction and functional form specification.
📝 Abstract
Covariate measurement error is pervasive in epidemiological research and distorts estimated exposure-outcome associations, yet correction methods have been studied almost exclusively under linear modelling assumptions. Their behaviour when the underlying association is non-linear and is itself estimated with flexible regression, remains poorly characterised. We report a blinded, multi-stage neutral comparison simulation study, conducted within the STRATOS initiative, evaluating measurement error correction coupled with flexible modelling of functional form. Six families of correction methods (pointwise and coefficient-based Simulation Extrapolation [SIMEX], Bayesian inference on the logit and risk scales, Multiple Imputation [MI], and Regression Calibration [RC]) were each combined with B-splines (BS), penalised splines (PS), fractional polynomials (FP), and natural splines (NS), yielding 23 analytic methods. Methods were applied to case-control data generated under five functional forms (J-shape, linear, two threshold models, and saturation) across simulated datasets spanning varying sample sizes, replication substudy sizes, error magnitudes, and error distributions, with classical additive error and a replication substudy for error calibration. Performance was assessed by the log mean squared error of the estimated function over the central 95 % of the exposure distribution. Pointwise SIMEX was the most accurate and most robust approach overall, followed by Bayesian methods and RC when paired with PS, FP, or NS; MI performed less well, and Bayesian estimation with unpenalised BS performed worst. PS, FP, and NS were near-equivalent, whereas BS was consistently inferior. No single method dominated across all scenarios, underscoring the value of sensitivity analyses.