🤖 AI Summary
In federated learning, data staleness—particularly the aging of continuous streaming data in time-sensitive tasks—severely degrades model performance, yet existing methods fail to jointly optimize data freshness and volume. To address this, we propose DUFL, a novel incentive mechanism that uniquely integrates three control dimensions: server payment, expired-data retention rate, and new-data acquisition volume. We introduce Data Staleness (DoS) as a quantifiable metric and establish its explicit mapping to model accuracy. Further, we formulate a bi-level Stackelberg game with dynamic constraints and derive closed-form optimal strategies. Theoretical analysis and experiments on real-world datasets demonstrate that DUFL significantly improves model accuracy while effectively balancing the freshness–volume trade-off. Our work establishes an interpretable and controllable paradigm for federated learning over time-varying data.
📝 Abstract
Handling data staleness remains a significant challenge in federated learning with highly time-sensitive tasks, where data is generated continuously and data staleness largely affects model performance. Although recent works attempt to optimize data staleness by determining local data update frequency or client selection strategy, none of them explore taking both data staleness and data volume into consideration. In this paper, we propose DUFL(Data Updating in Federated Learning), an incentive mechanism featuring an innovative local data update scheme manipulated by three knobs: the server's payment, outdated data conservation rate, and clients' fresh data collection volume, to coordinate staleness and volume of local data for best utilities. To this end, we introduce a novel metric called DoS(the Degree of Staleness) to quantify data staleness and conduct a theoretic analysis illustrating the quantitative relationship between DoS and model performance. We model DUFL as a two-stage Stackelberg game with dynamic constraint, deriving the optimal local data update strategy for each client in closed-form and the approximately optimal strategy for the server. Experimental results on real-world datasets demonstrate the significant performance of our approach.