🤖 AI Summary
In symbolic regression, selection operators are typically handcrafted, and existing LLM-driven evolutionary approaches suffer from code bloat and a lack of semantic guidance—hindering both interpretability and evolutionary efficiency. This paper proposes the first learning-based evolutionary framework that leverages large language models (LLMs) to automatically synthesize efficient and interpretable selection operators. We introduce two novel mechanisms: semantic-aware evaluation and code-bloat control, integrated with domain-knowledge-enhanced prompt engineering to jointly optimize operator generation for semantic validity and evolutionary efficacy. Evaluated on standard symbolic regression benchmarks, our automatically generated operators consistently outperform nine expert-designed baselines. To our knowledge, this is the first work demonstrating that LLMs can not only match but surpass human experts in algorithmic design—specifically, in crafting high-performing, semantically grounded selection operators for evolutionary symbolic regression.
📝 Abstract
Large language models (LLMs) have revolutionized algorithm development, yet their application in symbolic regression, where algorithms automatically discover symbolic expressions from data, remains constrained and is typically designed manually by human experts. In this paper, we propose a learning-to-evolve framework that enables LLMs to automatically design selection operators for evolutionary symbolic regression algorithms. We first identify two key limitations in existing LLM-based algorithm evolution techniques: code bloat and a lack of semantic guidance. Bloat results in unnecessarily complex components, and the absence of semantic awareness can lead to ineffective exchange of useful code components, both of which can reduce the interpretability of the designed algorithm or hinder evolutionary learning progress. To address these issues, we enhance the LLM-based evolution framework for meta symbolic regression with two key innovations: bloat control and a complementary, semantics-aware selection operator. Additionally, we embed domain knowledge into the prompt, enabling the LLM to generate more effective and contextually relevant selection operators. Our experimental results on symbolic regression benchmarks show that LLMs can devise selection operators that outperform nine expert-designed baselines, achieving state-of-the-art performance. This demonstrates that LLMs can exceed expert-level algorithm design for symbolic regression.