🤖 AI Summary
Physical law discovery suffers from a lack of high-quality, controllable benchmarks, hindering model training and rigorous evaluation. To address this, we propose the first unified framework for generating synthetic physical theories with full controllability: it formalizes axiom systems using first-order logic, enforces logical consistency via symbolic constraint solving, and introduces a progressive noise injection mechanism to generate realistic, noisy observational data. The framework supports both full-theory fitting and sub-theory discovery tasks. It integrates axiom verification and data fidelity assessment modules to ensure theoretical soundness and empirical plausibility. We generate multiple scalable synthetic theories and corresponding datasets, and conduct benchmark evaluations across three state-of-the-art symbolic regression systems. Results demonstrate significant improvements in robustness, interpretability, and theory discovery capability—establishing a new standard for evaluating physics-informed symbolic learning methods.
📝 Abstract
Automated means for discovering new physical laws of nature, starting from a given background theory and data, have recently emerged and are proving to have great potential to someday advance our understanding of the physical world. However, the fact there there are relatively few known theories in the physical sciences has made the training, testing and benchmarking of these systems difficult. To address these needs we have developed SynPAT, a system for generating synthetic physical theories, comprising a set of consistent axioms, together with noisy data that are either good fits to the axioms, or good fits to a subset of the axioms. We give a detailed description of the inner workings of SynPAT and its various capabilities. We also report on our benchmarking of three recent open-source symbolic regression systems using our generated theories and data.