Simultaneous Estimation and Model Choice for Big Discrete Time-to-Event Data with Additive Predictors

📅 2025-07-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In high-dimensional discrete-time risk models, data augmentation induces substantial computational burden and hinders simultaneous attainment of estimation accuracy and variable selection. To address this, we propose a batch-wise backward-fitting algorithm grounded in the distribution regression framework and, for the first time, extend it to discrete survival models with additive predictors. Our method integrates generalized linear models with an efficient data augmentation scheme, enabling joint parameter estimation and sparse variable selection. Extensive simulations and application to real-world African infant mortality data demonstrate that the proposed approach significantly improves computational scalability—reducing runtime substantially—while maintaining high estimation accuracy and robust feature selection performance. This work establishes an efficient, automated modeling paradigm for large-scale discrete-time event history analysis.

Technology Category

Application Category

📝 Abstract
Discrete-time hazard models are widely used when event times are measured in intervals or are not precisely observed. While these models can be estimated using standard generalized linear model techniques, they rely on extensive data augmentation, making estimation computationally demanding in high-dimensional settings. In this paper, we demonstrate how the recently proposed Batchwise Backfitting algorithm, a general framework for scalable estimation and variable selection in distributional regression, can be effectively extended to discrete hazard models. Using both simulated data and a large-scale application on infant mortality in sub-Saharan Africa, we show that the algorithm delivers accurate estimates, automatically selects relevant predictors, and scales efficiently to large data sets. The findings underscore the algorithm's practical utility for analysing large-scale, complex survival data with high-dimensional covariates.
Problem

Research questions and friction points this paper is trying to address.

Efficient estimation for big discrete time-to-event data
Automated variable selection in high-dimensional survival models
Scalable algorithm for complex survival data analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends Batchwise Backfitting to discrete hazard models
Automatically selects relevant predictors efficiently
Scales effectively for large high-dimensional datasets
🔎 Similar Papers
No similar papers found.