Safe-FedLLM: Delving into the Safety of Federated Large Language Models

📅 2026-01-12
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the vulnerability of federated large language models (FedLLMs) to malicious clients in open environments, a threat inadequately mitigated by existing defenses. To bridge this gap, the authors propose Safe-FedLLM, the first systematic framework that analyzes the attack surface and defensible characteristics of FedLLMs. Safe-FedLLM introduces a lightweight, three-tier probing defense architecture—operating at the step, client, and shadow levels—that leverages behavioral features of LoRA weight updates to detect and filter malicious contributions via efficient classifiers. Crucially, the approach preserves model utility for benign participants and maintains high accuracy and training efficiency even under strong adversarial settings with numerous malicious clients.

Technology Category

Application Category

📝 Abstract
Federated learning (FL) addresses data privacy and silo issues in large language models (LLMs). Most prior work focuses on improving the training efficiency of federated LLMs. However, security in open environments is overlooked, particularly defenses against malicious clients. To investigate the safety of LLMs during FL, we conduct preliminary experiments to analyze potential attack surfaces and defensible characteristics from the perspective of Low-Rank Adaptation (LoRA) weights. We find two key properties of FL: 1) LLMs are vulnerable to attacks from malicious clients in FL, and 2) LoRA weights exhibit distinct behavioral patterns that can be filtered through simple classifiers. Based on these properties, we propose Safe-FedLLM, a probe-based defense framework for federated LLMs, constructing defenses across three dimensions: Step-Level, Client-Level, and Shadow-Level. The core concept of Safe-FedLLM is to perform probe-based discrimination on the LoRA weights locally trained by each client during FL, treating them as high-dimensional behavioral features and using lightweight classification models to determine whether they possess malicious attributes. Extensive experiments demonstrate that Safe-FedLLM effectively enhances the defense capability of federated LLMs without compromising performance on benign data. Notably, our method effectively suppresses malicious data impact without significant impact on training speed, and remains effective even with many malicious clients. Our code is available at: https://github.com/dmqx/Safe-FedLLM.
Problem

Research questions and friction points this paper is trying to address.

Federated Learning
Large Language Models
Security
Malicious Clients
Safety
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning
Large Language Models
LoRA
Malicious Client Defense
Probe-based Detection
🔎 Similar Papers
No similar papers found.
M
Mingxiang Tao
Hainan University
Y
Yu Tian
Tsinghua University
Wenxuan Tu
Wenxuan Tu
Hainan University
Clustering analysisgraph machine learningfederated learning
Y
Yue Yang
Hainan University
X
Xue Yang
Shanghai Jiao Tong University
X
Xiangyan Tang
Hainan University