🤖 AI Summary
This work addresses the lack of a unified theoretical foundation for gradient aggregation methods in multi-objective optimization, which has led to fragmented algorithm design and convergence analysis. The authors propose a cohesive framework that guarantees convergence to Pareto stationary points by leveraging non-conflicting directions within the convex hull of gradients and projections onto dual cones. Central to this framework is the introduction of a “sufficient alignment condition” as a general convergence criterion, which elucidates theoretical connections among existing algorithms. Building on this insight, they derive a novel variant, capped MGDA, based on Conditional Value-at-Risk (CVaR). Integrating tools from convex analysis, dual cone theory, and Pareto stationarity, the framework unifies diverse gradient aggregation approaches. Empirical evaluations demonstrate the effectiveness of the proposed method on both synthetic tasks and real-world benchmarks, with notably enhanced robustness in adversarial federated learning settings.
📝 Abstract
Many machine learning problems involve multiple inherent trade-offs that are best addressed by gradient-based multi-objective optimization (MOO) algorithms. Existing methods are often proposed with various motivations, analyzed case by case, and differ algorithmically in how the component gradients are aggregated at each step. In this work, we develop a unifying framework for gradient aggregation in MOO, establishing (optimal) rates of convergence to Pareto stationarity, the standard measure of performance in MOO. Central to our analysis is a sufficient alignment condition, from which we derive a theorem showing that non-conflicting directions, when chosen within the convex hull of gradients, form a fundamental sufficient condition for convergence. We further show that feasibility can be ensured through projection onto the dual cone, broadening the scope of methods that admit convergence guarantees. In parallel, we present a primal optimization perspective of gradient aggregation that encompasses established algorithms, clarifies their theoretical relationships, and enables the design of new variants. As an illustration, we introduce capped MGDA, derived from a CVaR-based formulation, and demonstrate its robustness in adversarial federated learning. Finally, we validate our theory through experiments on synthetic problems and practical benchmarks.