π€ AI Summary
This study addresses the lack of efficient and accurate online estimation methods for high-dimensional generalized linear models in batch-free, distributed streaming settings. To this end, the authors propose a gradient-augmented surrogate loss function that approximates the cumulative loss using only historical summaries and integrates with renewable Lasso to enable high-dimensional online inference. The method overcomes the stringent batch-number constraints of existing renewable estimators and supports distributed stream processing under a masterβworker architecture, where nodes communicate solely via gradient vectors. Non-asymptotic error bounds are established through theoretical analysis, and experiments demonstrate that the proposed approach significantly outperforms current renewable estimators in both accuracy and efficiency on high-dimensional linear and logistic regression tasks.
π Abstract
We study online estimation for high-dimensional generalized linear models with streaming data. First, for the non-distributed setting, we propose a gradient-enhanced surrogate loss that approximates the cumulative loss using only historical summaries, which modifies and improves upon the existing renewable estimation approach for the same model in the high-dimensional setting, and removes the batch-number constraint in previous studies. We then extend the method to distributed streaming data under the master-client architecture, where batches are partitioned across sites and only summaries (gradient vectors) are exchanged. Instead of directing applying the popular method of Jordan et al. (2019) to the surrogate quadratic loss, our adjusted approach does not require the clients to compute the full surrogate loss. We derive non-asymptotic error bounds under the high-dimensional scaling, without the stringent constraint on the number of batches in the previous studies. Simulation results under linear and logistic models, together with a real-data application, show improved accuracy over existing renewable estimators.