🤖 AI Summary
In cardiovascular cohort studies, ICD-9 coding frequently overestimates cardiovascular disease (CVD) events—particularly in aging epidemiological studies lacking event adjudication (e.g., EPESE), where misclassification severely biases risk estimates. To address this, we propose a Bayesian joint modeling framework that innovatively integrates Bayesian additive regression trees (BART) with posterior predictive distribution inference. Leveraging adjudicated event data from a linked cohort (e.g., ARIC), the model calibrates ICD-9 coding errors to enable unbiased CVD event identification in non-adjudicated datasets. Crucially, it imposes no assumptions about the missingness mechanism of the gold standard. In simulation studies and empirical analysis using ARIC data, our method substantially corrects CVD event overestimation, improves event classification accuracy, and enhances reliability of risk factor effect estimation. This provides a generalizable, resource-efficient error-correction framework for aging cohort studies where systematic event adjudication is infeasible.
📝 Abstract
An important issue in joint modelling for outcomes and longitudinal risk factors in cohort studies is to have an accurate assessment of events. Events determined based on ICD-9 codes can be very inaccurate, in particular for cardiovascular disease (CVD) where ICD-9 codes may overestimate the frequency of CVD. Motivated by the lack of adjudicated events in the Established Populations for Epidemiologic Studies of the Elderly (EPESE) cohort, we develop methods that use a related cohort Atherosclerosis Risk in Communities (ARIC), with both ICD-9 code events and adjudicated events, to create a posterior predictive distribution of adjudicated events. The methods are based on the construction of flexible Bayesian joint models combined with a Bayesian additive regression trees to directly address the ICD-9 misclassification. We assessed the performance of our approach by simulation study and applied to ARIC data.