🤖 AI Summary
To address the challenge of federated generative modeling over non-IID multi-source image data, this paper proposes a decoupled federated VAE framework featuring group-customized decoders. The method decomposes the latent space into shared semantic representations and client-specific texture features, enabling effective knowledge transfer across heterogeneous clients. It introduces a group-aware decoder architecture with modular, client-adaptive branches—compatible with hierarchical VAEs—and incorporates configurable priors to enhance decoupling robustness. Evaluated on MNIST+FashionMNIST and a multi-source RGB composite dataset (cartoons, faces, animals, ships, remote sensing), the approach achieves over 35% FID reduction compared to state-of-the-art federated VAE baselines. This demonstrates substantial improvements in generated sample quality and cross-domain generalization capability under realistic non-IID data distributions.
📝 Abstract
Federated learning is a machine learning paradigm that enables decentralized clients to collaboratively learn a shared model while keeping all the training data local. While considerable research has focused on federated image generation, particularly Generative Adversarial Networks, Variational Autoencoders have received less attention. In this paper, we address the challenges of non-IID (independently and identically distributed) data environments featuring multiple groups of images of different types. Non-IID data distributions can lead to difficulties in maintaining a consistent latent space and can also result in local generators with disparate texture features being blended during aggregation. We thereby introduce FissionVAE that decouples the latent space and constructs decoder branches tailored to individual client groups. This method allows for customized learning that aligns with the unique data distributions of each group. Additionally, we incorporate hierarchical VAEs and demonstrate the use of heterogeneous decoder architectures within FissionVAE. We also explore strategies for setting the latent prior distributions to enhance the decoupling process. To evaluate our approach, we assemble two composite datasets: the first combines MNIST and FashionMNIST; the second comprises RGB datasets of cartoon and human faces, wild animals, marine vessels, and remote sensing images. Our experiments demonstrate that FissionVAE greatly improves generation quality on these datasets compared to baseline federated VAE models.