🤖 AI Summary
To address transmission line overloading and challenging topology reconfiguration induced by load growth in power systems, this paper proposes a multi-objective reinforcement learning (MORL) control method tailored for grid topology reconfiguration. The approach uniquely integrates Deep Optimistic Linear Support (DOL) with Multi-Objective Proximal Policy Optimization (MOPPO), explicitly modeling conflicting objectives—including line loading reduction, topology change magnitude constraints, and switch operation minimization—to generate an interpretable Pareto-optimal policy set. Validated via transient simulations, the method achieves a 30% improvement in instability prevention success rate under fault scenarios compared to single-objective RL baselines, and yields a 20% performance gain under limited training budgets. Moreover, its Pareto front approximation accuracy significantly surpasses that of random search.
📝 Abstract
Transmission grid congestion increases as the electrification of various sectors requires transmitting more power. Topology control, through substation reconfiguration, can reduce congestion but its potential remains under-exploited in operations. A challenge is modeling the topology control problem to align well with the objectives and constraints of operators. Addressing this challenge, this paper investigates the application of multi-objective reinforcement learning (MORL) to integrate multiple conflicting objectives for power grid topology control. We develop a MORL approach using deep optimistic linear support (DOL) and multi-objective proximal policy optimization (MOPPO) to generate a set of Pareto-optimal policies that balance objectives such as minimizing line loading, topological deviation, and switching frequency. Initial case studies show that the MORL approach can provide valuable insights into objective trade-offs and improve Pareto front approximation compared to a random search baseline. The generated multi-objective RL policies are 30% more successful in preventing grid failure under contingencies and 20% more effective when training budget is reduced - compared to the common single objective RL policy.