🤖 AI Summary
Current machine learning (ML) research on optimal power flow (OPF) suffers from scarce benchmark datasets, inconsistent evaluation protocols, and poor reproducibility. To address these challenges, we introduce the first open-source ML benchmark platform specifically designed for OPF. Our method integrates three core innovations: (1) the first large-scale, real-world power grid dataset incorporating temporal dynamics; (2) unified support for AC, DC, and second-order cone programming (SOCP) OPF formulations, coupled with joint global–local operating condition representation; and (3) an end-to-end, standardized pipeline—covering OPF instance generation, multi-formulation power flow modeling, ML training, and reproducible evaluation—deployed on Hugging Face. The platform provides publicly accessible datasets spanning mainstream grid scales and a unified evaluation toolkit. By lowering entry barriers and enabling fair, comparable, and reproducible ML-based OPF validation, it establishes a foundational infrastructure for advancing data-driven power system optimization.
📝 Abstract
Machine Learning (ML) techniques for Optimal Power Flow (OPF) problems have recently garnered significant attention, reflecting a broader trend of leveraging ML to approximate and/or accelerate the resolution of complex optimization problems. These developments are necessitated by the increased volatility and scale in energy production for modern and future grids. However, progress in ML for OPF is hindered by the lack of standardized datasets and evaluation metrics, from generating and solving OPF instances, to training and benchmarking machine learning models. To address this challenge, this paper introduces PGLearn, a comprehensive suite of standardized datasets and evaluation tools for ML and OPF. PGLearn provides datasets that are representative of real-life operating conditions, by explicitly capturing both global and local variability in the data generation, and by, for the first time, including time series data for several large-scale systems. In addition, it supports multiple OPF formulations, including AC, DC, and second-order cone formulations. Standardized datasets are made publicly available to democratize access to this field, reduce the burden of data generation, and enable the fair comparison of various methodologies. PGLearn also includes a robust toolkit for training, evaluating, and benchmarking machine learning models for OPF, with the goal of standardizing performance evaluation across the field. By promoting open, standardized datasets and evaluation metrics, PGLearn aims at democratizing and accelerating research and innovation in machine learning applications for optimal power flow problems. Datasets are available for download at https://www.huggingface.co/PGLearn.