FLStore: Efficient Federated Learning Storage for non-training workloads

📅 2025-03-01

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

In federated learning (FL), non-training workloads—including scheduling, personalization, and debugging—suffer from high latency and cost due to reliance on centralized metadata storage. To address this, we propose FLStore, the first stateless, lightweight FL metadata storage framework that unifies the data and compute planes within server-side caches. FLStore introduces a locality-aware caching strategy and a hierarchical indexing mechanism to optimize metadata access. It supports plug-and-play integration with mainstream FL frameworks (e.g., PySyft, FedML), provides fault tolerance, and scales linearly with deployment size. Experimental evaluation demonstrates that FLStore reduces average request latency by 71% and operational cost by 92.45% compared to cloud object storage; against in-memory cloud caching, it achieves 64.6% lower latency and 98.83% lower cost—while maintaining full FL metadata functionality and system robustness.

Technology Category

Application Category

📝 Abstract

Federated Learning (FL) is an approach for privacy-preserving Machine Learning (ML), enabling model training across multiple clients without centralized data collection. With an aggregator server coordinating training, aggregating model updates, and storing metadata across rounds. In addition to training, a substantial part of FL systems are the non-training workloads such as scheduling, personalization, clustering, debugging, and incentivization. Most existing systems rely on the aggregator to handle non-training workloads and use cloud services for data storage. This results in high latency and increased costs as non-training workloads rely on large volumes of metadata, including weight parameters from client updates, hyperparameters, and aggregated updates across rounds, making the situation even worse. We propose FLStore, a serverless framework for efficient FL non-training workloads and storage. FLStore unifies the data and compute planes on a serverless cache, enabling locality-aware execution via tailored caching policies to reduce latency and costs. Per our evaluations, compared to cloud object store based aggregator server FLStore reduces per request average latency by 71% and costs by 92.45%, with peak improvements of 99.7% and 98.8%, respectively. Compared to an in-memory cloud cache based aggregator server, FLStore reduces average latency by 64.6% and costs by 98.83%, with peak improvements of 98.8% and 99.6%, respectively. FLStore integrates seamlessly with existing FL frameworks with minimal modifications, while also being fault-tolerant and highly scalable.

Problem

Research questions and friction points this paper is trying to address.

Efficient storage for non-training workloads in Federated Learning.

Reducing latency and costs in FL metadata handling.

Serverless framework for locality-aware execution and scalability.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Serverless framework for FL non-training workloads

Unified data and compute planes on serverless cache

Locality-aware execution with tailored caching policies

🔎 Similar Papers

Advances in APPFL: A Comprehensive and Extensible Federated Learning Framework