EGG-SR: Embedding Symbolic Equivalence into Symbolic Regression via Equality Graph

📅 2025-11-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Symbolic regression suffers from exponential explosion of the expression search space and redundant exploration of mathematically equivalent expressions, leading to computational inefficiency and slow convergence. This paper proposes EGG-SR, the first framework to integrate e-graphs into symbolic regression, explicitly modeling expression equivalence to unify support for Monte Carlo tree search (MCTS) pruning, deep reinforcement learning (DRL) gradient variance reduction, and large language model (LLM)-enhanced feedback. Evaluated on multiple physics benchmarks, EGG-SR significantly outperforms state-of-the-art methods: it reduces normalized mean squared error by up to 42%, accelerates convergence by 3.1×, and improves both equation discovery accuracy and robustness. Its core innovation lies in establishing the e-graph as a unified semantic infrastructure for equivalence-aware optimization across heterogeneous algorithmic components.

Technology Category

Application Category

📝 Abstract
Symbolic regression seeks to uncover physical laws from experimental data by searching for closed-form expressions, which is an important task in AI-driven scientific discovery. Yet the exponential growth of the search space of expression renders the task computationally challenging. A promising yet underexplored direction for reducing the effective search space and accelerating training lies in symbolic equivalence: many expressions, although syntactically different, define the same function -- for example, $log(x_1^2x_2^3)$, $log(x_1^2)+log(x_2^3)$, and $2log(x_1)+3log(x_2)$. Existing algorithms treat such variants as distinct outputs, leading to redundant exploration and slow learning. We introduce EGG-SR, a unified framework that integrates equality graphs (e-graphs) into diverse symbolic regression algorithms, including Monte Carlo Tree Search (MCTS), deep reinforcement learning (DRL), and large language models (LLMs). EGG-SR compactly represents equivalent expressions through the proposed EGG module, enabling more efficient learning by: (1) pruning redundant subtree exploration in EGG-MCTS, (2) aggregating rewards across equivalence classes in EGG-DRL, and (3) enriching feedback prompts in EGG-LLM. Under mild assumptions, we show that embedding e-graphs tightens the regret bound of MCTS and reduces the variance of the DRL gradient estimator. Empirically, EGG-SR consistently enhances multiple baselines across challenging benchmarks, discovering equations with lower normalized mean squared error than state-of-the-art methods. Code implementation is available at: https://www.github.com/jiangnanhugo/egg-sr.
Problem

Research questions and friction points this paper is trying to address.

Symbolic regression faces exponential search space growth
Existing methods treat equivalent expressions as distinct outputs
EGG-SR integrates equality graphs to reduce redundancy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses equality graphs to represent equivalent expressions
Integrates e-graphs into MCTS, DRL, and LLM algorithms
Reduces search space by pruning redundant expression variants
N
Nan Jiang
University of Texas at El Paso, TX, USA
Z
Ziyi Wang
Purdue University, IN, USA
Yexiang Xue
Yexiang Xue
Assistant Professor, Purdue University
Artificial Intelligence