Trustformer: A Trusted Federated Transformer

📅 2025-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual challenges of privacy leakage and high communication overhead in federated training of Transformer models, this paper proposes the first lightweight federated Transformer training framework that integrates intra-layer k-means weight clustering with Intel SGX-based trusted execution environments. Instead of uploading full model parameters, clients locally cluster weights layer-wise and transmit only the cluster centroids; SGX ensures end-to-end privacy during secure aggregation. The framework reduces communication volume by approximately 72% compared to FedAvg while maintaining state-of-the-art performance on the WMT machine translation benchmark—achieving a BLEU score within 0.3 points of the centralized baseline. To our knowledge, this is the first approach to simultaneously achieve high communication efficiency, strong model utility, and rigorous privacy guarantees—enabling practical, privacy-preserving federated learning for large-scale Transformer models.

Technology Category

Application Category

📝 Abstract
Transformers, a cornerstone of deep-learning architectures for sequential data, have achieved state-of-the-art results in tasks like Natural Language Processing (NLP). Models such as BERT and GPT-3 exemplify their success and have driven the rise of large language models (LLMs). However, a critical challenge persists: safeguarding the privacy of data used in LLM training. Privacy-preserving techniques like Federated Learning (FL) offer potential solutions, but practical limitations hinder their effectiveness for Transformer training. Two primary issues are (I) the risk of sensitive information leakage due to aggregation methods like FedAvg or FedSGD, and (II) the high communication overhead caused by the large size of Transformer models. This paper introduces a novel FL method that reduces communication overhead while maintaining competitive utility. Our approach avoids sharing full model weights by simulating a global model locally. We apply k-means clustering to each Transformer layer, compute centroids locally, and transmit only these centroids to the server instead of full weights or gradients. To enhance security, we leverage Intel SGX for secure transmission of centroids. Evaluated on a translation task, our method achieves utility comparable to state-of-the-art baselines while significantly reducing communication costs. This provides a more efficient and privacy-preserving FL solution for Transformer models.
Problem

Research questions and friction points this paper is trying to address.

Reducing communication overhead in federated Transformer training
Preventing sensitive information leakage during model aggregation
Enhancing privacy preservation for large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses k-means clustering for weight compression
Leverages Intel SGX for secure centroid transmission
Simulates global model locally to avoid weight sharing
🔎 Similar Papers
No similar papers found.