Symbolically Regressing Fish Biomass Spectral Data: A Linear Genetic Programming Method with Tunable Primitives

πŸ“… 2025-05-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Fish biomass spectral data suffer from high noise levels and limited sample sizes, hindering effective pattern discovery. Method: We formulate the modeling task as symbolic regression and propose a linear genetic programming approach with tunable basis elements. By dynamically optimizing intrinsic coefficients within each basis element, our method significantly enhances pattern mining capability and generalization performance under small-sample conditions; the resulting models are compact and physically interpretable. Contribution/Results: Integrated with spectral preprocessing and feature interpretability analysis, our method outperforms all baselines across ten fish biomass component prediction tasks: average prediction error decreases by 12.7%, model size shrinks by 68%, inference speed increases 3.2Γ—, and biologically meaningful key spectral bandsβ€”such as 420–450 nm and 670–690 nmβ€”are successfully identified.

Technology Category

Application Category

πŸ“ Abstract
Machine learning techniques play an important role in analyzing spectral data. The spectral data of fish biomass is useful in fish production, as it carries many important chemistry properties of fish meat. However, it is challenging for existing machine learning techniques to comprehensively discover hidden patterns from fish biomass spectral data since the spectral data often have a lot of noises while the training data are quite limited. To better analyze fish biomass spectral data, this paper models it as a symbolic regression problem and solves it by a linear genetic programming method with newly proposed tunable primitives. In the symbolic regression problem, linear genetic programming automatically synthesizes regression models based on the given primitives and training data. The tunable primitives further improve the approximation ability of the regression models by tuning their inherent coefficients. Our empirical results over ten fish biomass targets show that the proposed method improves the overall performance of fish biomass composition prediction. The synthesized regression models are compact and have good interpretability, which allow us to highlight useful features over the spectrum. Our further investigation also verifies the good generality of the proposed method across various spectral data treatments and other symbolic regression problems.
Problem

Research questions and friction points this paper is trying to address.

Analyzing noisy fish biomass spectral data with limited samples
Improving prediction of fish biomass composition via symbolic regression
Enhancing model interpretability and generality for spectral data analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear genetic programming for symbolic regression
Tunable primitives enhance model approximation
Compact interpretable models for spectral data
πŸ”Ž Similar Papers
No similar papers found.
Zhixing Huang
Zhixing Huang
Victoria university of Wellington
Genetic ProgrammingCombinatorial OptimizationProgram SynthesisSymbolic Optimization
Bing Xue
Bing Xue
Meta Superintelligence Labs
LLMmachine learning for healthcarerepresentation learninggenerative models
M
Mengjie Zhang
the Centre for Data Science and Artificial Intelligence & School of Engineering and Computer Science, Victoria University of Wellington, Wellington, 6140, New Zealand
J
Jeremy S. Ronney
Department of Chemistry, University of Otago, Dunedin, New Zealand
K
Keith C. Gordon
MacDiarmid Institute for Advanced Materials and Nanotechnology, Chemistry Department, University of Otago, Dunedin, New Zealand
D
D. Killeen
The New Zealand Institute for Plant and Food Research Limited, Nelson, New Zealand