🤖 AI Summary
The fundamental trade-off between accuracy and interpretability remains a core challenge in symbolic regression. This paper proposes a multi-objective modular evolutionary approach built upon the GP-GOMEA framework, explicitly modeling subexpression reuse as an optimization objective—jointly optimizing expression size, prediction accuracy, and module reusability for the first time. To rigorously assess Pareto front quality, we employ the hypervolume metric. Experimental results reveal that, despite the enhanced multi-objective design, single-objective GP-GOMEA augmented with a dynamic archiving strategy achieves a significantly higher average hypervolume (+12.7%), demonstrating superior balance between solution-set diversity and convergence. Our findings expose inherent limitations of multi-objective mechanisms in symbolic regression, providing critical empirical guidance for algorithm selection. Moreover, this work advances theoretical understanding of co-optimizing model simplicity and predictive performance in interpretable AI.
📝 Abstract
In Symbolic Regression (SR), achieving a proper balance between accuracy and interpretability remains a key challenge. The Genetic Programming variant of the Gene-pool Optimal Mixing Evolutionary Algorithm (GP-GOMEA) is of particular interest as it achieves state-of-the-art performance using a template that limits the size of expressions. A recently introduced expansion, modular GP-GOMEA, is capable of decomposing expressions using multiple subexpressions, further increasing chances of interpretability. However, modular GP-GOMEA may create larger expressions, increasing the need to balance size and accuracy. A multi-objective variant of GP-GOMEA exists, which can be used, for instance, to optimize for size and accuracy simultaneously, discovering their trade-off. However, even with enhancements that we propose in this paper to improve the performance of multi-objective modular GP-GOMEA, when optimizing for size and accuracy, the single-objective version in which a multi-objective archive is used only for logging, still consistently finds a better average hypervolume. We consequently analyze when a single-objective approach should be preferred. Additionally, we explore an objective that stimulates re-use in multi-objective modular GP-GOMEA.