🤖 AI Summary
Existing methods for integrating multiple secondary outcomes (e.g., blood biochemistry, urine biomarkers) to enhance inference on primary liver health outcomes often rely on strong modeling assumptions or prior knowledge to construct over-identified estimating functions, resulting in limited robustness and generalizability. This paper proposes a data-driven integrative learning framework that imposes no pre-specified functional form or stringent distributional assumptions. Leveraging statistical learning theory, it constructs an integrated estimating equation, jointly optimizing variability minimization and computational efficiency to enable adaptive fusion and robust aggregation of heterogeneous secondary outcomes. Simulation studies demonstrate substantial variance reduction in primary outcome estimation. Applied to UK Biobank data, the method robustly identifies a positive association between smoking and fatty liver disease—previously unconfirmed—and reveals significantly stronger effects among older adults. This work establishes a general, assumption-light paradigm for integrative analysis of multi-source biomarkers.
📝 Abstract
In the era of big data, secondary outcomes have become increasingly important alongside primary outcomes. These secondary outcomes, which can be derived from traditional endpoints in clinical trials, compound measures, or risk prediction scores, hold the potential to enhance the analysis of primary outcomes. Our method is motivated by the challenge of utilizing multiple secondary outcomes, such as blood biochemistry markers and urine assays, to improve the analysis of the primary outcome related to liver health. Current integration methods often fall short, as they impose strong model assumptions or require prior knowledge to construct over-identified working functions. This paper addresses these statistical challenges and potentially opens a new avenue in data integration by introducing a novel integrative learning framework that is applicable in a general setting. The proposed framework allows for the robust, data-driven integration of information from multiple secondary outcomes, promotes the development of efficient learning algorithms, and ensures optimal use of available data. Extensive simulation studies demonstrate that the proposed method significantly reduces variance in primary outcome analysis, outperforming existing integration approaches. Additionally, applying this method to UK Biobank (UKB) reveals that cigarette smoking is associated with increased fatty liver measures, with these effects being particularly pronounced in the older adult cohort.