π€ AI Summary
This work addresses the limitations of single-user task streams, which lack sufficient diversity to build comprehensive agent skill repertoires, and critiques existing cross-user collaboration approaches that compromise privacy and enforce uniform skill representations, thereby failing to accommodate client heterogeneity. To overcome these challenges, the paper proposes a privacy-preserving federated learning framework that introduces structured semantic skill differences (skill diffs) as the fundamental communication unit, eliminating the need to share raw behavioral trajectories. This design enables strictly personalized skill evolution while safeguarding user privacy. The server dynamically models each clientβs capability boundary, facilitating collaborative evolution among heterogeneous agents. Experiments across 20 task families demonstrate that the approach improves task success rates by up to 44.4% and reduces computational costs by 37.5% compared to self-evolution baselines.
π Abstract
Modern LLM agents increasingly rely on skill libraries to handle complex tasks, making skill evolution a primary driver of self-improvement. However, isolated single-user task streams lack the diversity required to build comprehensive skills. While cross-user collaboration can overcome this data bottleneck, current trajectory-sharing approaches compromise user privacy and impose a uniform global library that fails to accommodate client heterogeneity. We introduce FederatedSkill, a privacy-preserving framework for collaborative agent evolution. Moving beyond raw trajectory sharing, FederatedSkill utilizes semantic skill diffs, structured patches over local libraries, as the fundamental unit of communication. On the server side, an evolution agent aggregates these patches to dynamically model client-specific capability boundaries, facilitating strictly personalized skill evolution rather than a suboptimal global average. Evaluated across 20 distinct agent task families, FederatedSkill demonstrates substantial gains over self-evolving baselines, achieving up to a 44.4% increase in success rate and a 37.5% reduction in computational cost.