🤖 AI Summary
This study systematically evaluates symbolic regression (SR) methods for automatically discovering governing equations of dynamical systems—from time-series data—across physics, ecology, and epidemiology, including nine canonical processes (e.g., chaotic systems and SIR-type epidemic models). Five state-of-the-art SR algorithms are benchmarked under identical experimental conditions. Results show that PySR significantly outperforms competitors in both equation recovery accuracy and out-of-distribution predictive generalization; several recovered equations match ground-truth analytical solutions with relative error <10⁻³. Methodologically, this work introduces the first unified benchmark for cross-domain nonlinear dynamical systems to comparatively assess SR methodologies. Crucially, it validates the efficacy and robustness of PySR’s hybrid framework—integrating sparse regression with evolutionary search—for modeling complex, interpretable dynamics. The findings establish SR as a reliable, interpretable, and data-driven tool for dynamical system identification.
📝 Abstract
The process of discovering equations from data lies at the heart of physics and in many other areas of research, including mathematical ecology and epidemiology. Recently, machine learning methods known as symbolic regression have automated this process. As several methods are available in the literature, it is important to compare them, particularly for dynamic systems that describe complex phenomena. In this paper, five symbolic regression methods were used for recovering equations from nine dynamical processes, including chaotic dynamics and epidemic models, with the PySR method proving to be the most suitable for inferring equations. Benchmark results demonstrate its high predictive power and accuracy, with some estimates being indistinguishable from the original analytical forms. These results highlight the potential of symbolic regression as a robust tool for inferring and modelling real-world phenomena.