🤖 AI Summary
This paper addresses automated equation discovery in scientific modeling. We propose MGMT, a neural-guided grammatical Monte Carlo Tree Search framework. Methodologically, MGMT directly encodes context-free grammar (CFG) production rules as the action space of MCTS—enabling unified support for both supervised and reinforcement learning paradigms—and introduces risk-seeking and AmEx-MCTS variants. Through systematic ablation studies—including multi-task grammar-guided search, contrastive-learning-based tabular embedding, and comparative evaluation across RNN, CNN, and Transformer architectures—we demonstrate that supervised learning consistently outperforms reinforcement learning in equation discovery. Empirically, CFG-based action spaces prove more sample-efficient than token-level alternatives; MGMT achieves significant improvements in equation accuracy and interpretability across multiple benchmarks; and among seven embedding strategies, the supervised paradigm uniformly dominates. These results establish grammar-aware neural search with supervised guidance as a superior paradigm for symbolic regression.
📝 Abstract
Deep learning approaches are becoming increasingly attractive for equation discovery. We show the advantages and disadvantages of using neural-guided equation discovery by giving an overview of recent papers and the results of experiments using our modular equation discovery system MGMT ($ extbf{M}$ulti-Task $ extbf{G}$rammar-Guided $ extbf{M}$onte-Carlo $ extbf{T}$ree Search for Equation Discovery). The system uses neural-guided Monte-Carlo Tree Search (MCTS) and supports both supervised and reinforcement learning, with a search space defined by a context-free grammar. We summarize seven desirable properties of equation discovery systems, emphasizing the importance of embedding tabular data sets for such learning approaches. Using the modular structure of MGMT, we compare seven architectures (among them, RNNs, CNNs, and Transformers) for embedding tabular datasets on the auxiliary task of contrastive learning for tabular data sets on an equation discovery task. For almost all combinations of modules, supervised learning outperforms reinforcement learning. Moreover, our experiments indicate an advantage of using grammar rules as action space instead of tokens. Two adaptations of MCTS -- risk-seeking MCTS and AmEx-MCTS -- can improve equation discovery with that kind of search.