🤖 AI Summary
In complex network systems, strong multivariate interactions impede mechanistic interpretation of underlying physical principles. Method: We propose the first pre-trained symbolic regression framework tailored for complex networks, integrating graph representation learning to model variable relationships, sparse optimization to enhance search efficiency, and semantic constraints to guide the discovery of interpretable equations. Contribution/Results: Innovatively adapting the pre-training paradigm to symbolic regression, our framework enables dynamic mechanism transfer across physics, biochemistry, ecology, and epidemiology. Experiments demonstrate a threefold improvement in equation discovery efficiency; on global pandemic data, it achieves state-of-the-art fitting accuracy and successfully recovers epidemiologically meaningful transmission interaction laws—achieving both high fidelity and strong interpretability.
📝 Abstract
In science, we are interested not only in forecasting but also in understanding how predictions are made, specifically what the interpretable underlying model looks like. Data-driven machine learning technology can significantly streamline the complex and time-consuming traditional manual process of discovering scientific laws, helping us gain insights into fundamental issues in modern science. In this work, we introduce a pre-trained symbolic foundation regressor that can effectively compress complex data with numerous interacting variables while producing interpretable physical representations. Our model has been rigorously tested on non-network symbolic regression, symbolic regression on complex networks, and the inference of network dynamics across various domains, including physics, biochemistry, ecology, and epidemiology. The results indicate a remarkable improvement in equation inference efficiency, being three times more effective than baseline approaches while maintaining accurate predictions. Furthermore, we apply our model to uncover more intuitive laws of interaction transmission from global epidemic outbreak data, achieving optimal data fitting. This model extends the application boundary of pre-trained symbolic regression models to complex networks, and we believe it provides a foundational solution for revealing the hidden mechanisms behind changes in complex phenomena, enhancing interpretability, and inspiring further scientific discoveries.