Federated generalized linear mixed models based on one-time shared summary statistics

📅 2026-05-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

228K/year
🤖 AI Summary
This study addresses key limitations of existing estimation methods for generalized linear mixed models (GLMMs) under privacy constraints—namely, ecological bias, inadequate handling of heterogeneity, and excessive communication overhead. The authors propose a novel approach that generates synthetic individual-level data from one-time shared summary statistics, enabling accurate parameter estimation in linear, logistic, and Poisson mixed models. By reconstructing pseudo-data that closely approximates the original individual records through moment-matching, the method achieves estimates virtually indistinguishable from those obtained with true data (agreeing to three decimal places). Critically, it incurs negligible losses in estimation bias, confidence interval coverage, or predictive performance while requiring only a single exchange of aggregated information per participant. This work thus represents the first framework to simultaneously ensure strong privacy protection, minimal communication cost, and high statistical fidelity in distributed GLMM estimation.
📝 Abstract
Data privacy has increasingly become a daunting challenge because it limits data availability, which is essential in estimating statistical models such as generalized linear mixed models. Access to personal data often involves considerable time, effort, and paperwork, which can impede research progress and collaboration. Existing approaches that do not use individual-level data for model estimation are either prone to ecological bias, cannot handle heterogeneity, or require iterative communication. In this paper, we propose an approach to estimate generalized linear mixed models based on summary statistics shared only once. We used linear, logistic, and Poisson mixed models as examples to demonstrate the methodology. Our strategy involves generating pseudo-data whose summary statistics match those of the actual but unavailable data. These pseudo-data are then used for model estimation instead of the actual data. The estimates we achieve are identical (up to the third decimal place) to those derived from actual data and have similar bias, coverage, and prediction performance. Communication and resource efficiency distinguish our approach from existing methods.
Problem

Research questions and friction points this paper is trying to address.

federated learning
generalized linear mixed models
data privacy
summary statistics
ecological bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

federated learning
generalized linear mixed models
summary statistics
pseudo-data
privacy-preserving
🔎 Similar Papers
2024-10-04IEEE International Symposium on Network Computing and ApplicationsCitations: 3
M
Marie Analiz April Limpoco
Interuniversity Institute for Biostatistics and statistical Bioinformatics (I-BioStat), Data Science Institute (DSI), Hasselt University, Hasselt, Belgium
C
Christel Faes
Interuniversity Institute for Biostatistics and statistical Bioinformatics (I-BioStat), Data Science Institute (DSI), Hasselt University, Hasselt, Belgium
Niel Hens
Niel Hens
Professor of Biostatistics
biostatisticsinfectious diseases