GP-GOMEA with GPU-Based Fitness Evaluations: Design and Performance Analysis

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

260K/year

🤖 AI Summary

This work addresses the scalability limitations of GP-GOMEA in symbolic regression, which stem from its high computational cost when applied to large-scale datasets and complex expressions. The study presents the first GPU-accelerated implementation of GP-GOMEA, introducing a GPU-friendly templated individual representation and a parallel fitness evaluation strategy. Combined with a population-level parallel search mechanism, this approach substantially increases evaluation throughput. Empirical results demonstrate that the proposed method significantly outperforms existing approaches on four standard benchmarks, reliably rediscovering the largest Feynman equations within four hours—a milestone not previously achieved. Furthermore, the work provides insights into how expression structure influences search difficulty, shedding light on the underlying mechanisms governing problem hardness in symbolic regression.

📝 Abstract

GP-GOMEA is a state-of-the-art evolutionary algorithm for symbolic regression, known for discovering small and interpretable models. However, its computational cost remains substantial, limiting its applicability to larger datasets and more complex target expressions. In contrast, the rise of modern subsymbolic approaches, particularly deep learning, has been driven largely by the massive parallelism offered by GPUs. In this work, we take the first major step toward a fully GPU-accelerated GP-GOMEA by introducing a GPU-based fitness evaluation scheme. We design a GPU-friendly representation of GP-GOMEA's template-based individuals and a corresponding evaluation strategy that exploits the inherent parallelism of population-based search. This substantially increases evaluation throughput, enabling orders of magnitude more evaluations within the same time budget. Across four standard symbolic regression benchmarks, this increased evaluation capacity yields performance improvements, particularly for larger datasets and larger population sizes. Moreover, the ability to efficiently evaluate much larger datasets and more complex templates enables analyses that were previously infeasible, allowing us to systematically analyze what makes expressions increasingly difficult for GP-GOMEA, providing new insights into how expression structure affects search difficulty. Finally, for the first time, this expanded capability allows a problem-agnostic evolutionary algorithm to reliably regress one of the largest Feynman equations within four hours.

Problem

Research questions and friction points this paper is trying to address.

symbolic regression

computational cost

GP-GOMEA

large datasets

complex expressions

Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU acceleration

symbolic regression

GP-GOMEA