Advancing Symbolic Discovery on Unsupervised Data: A Pre-training Framework for Non-degenerate Implicit Equation Discovery

📅 2025-05-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Discovering implicit equations $f(mathbf{x}) = 0$ from unsupervised scientific data (e.g., particle trajectories, astronomical observations) remains challenging, as existing symbolic regression methods suffer from degraded performance due to the prevalence of degenerate solutions in the search space. Method: We propose PIE, a pretrained neural-symbolic framework that reformulates implicit equation discovery as a “scientific relation translation” task, enabling end-to-end inference of non-degenerate mathematical structural skeletons. PIE integrates domain-specific scientific priors into synthetic pretraining data and jointly leverages large language models and neural-symbolic reasoning. Contribution/Results: On diverse real-world unsupervised scientific datasets, PIE achieves, for the first time, high-accuracy and robust discovery of non-degenerate implicit equations—significantly outperforming state-of-the-art approaches. Crucially, it breaks the conventional symbolic regression paradigm that relies on labeled input-output pairs, enabling purely unsupervised, physics-informed equation learning.

Technology Category

Application Category

📝 Abstract
Symbolic regression (SR) -- which learns symbolic equations to describe the underlying relation from input-output pairs -- is widely used for scientific discovery. However, a rich set of scientific data from the real world (e.g., particle trajectories and astrophysics) are typically unsupervised, devoid of explicit input-output pairs. In this paper, we focus on symbolic implicit equation discovery, which aims to discover the mathematical relation from unsupervised data that follows an implicit equation $f(mathbf{x}) =0$. However, due to the dense distribution of degenerate solutions (e.g., $f(mathbf{x})=x_i-x_i$) in the discrete search space, most existing SR approaches customized for this task fail to achieve satisfactory performance. To tackle this problem, we introduce a novel pre-training framework -- namely, Pre-trained neural symbolic model for Implicit Equation (PIE) -- to discover implicit equations from unsupervised data. The core idea is that, we formulate the implicit equation discovery on unsupervised scientific data as a translation task and utilize the prior learned from the pre-training dataset to infer non-degenerate skeletons of the underlying relation end-to-end. Extensive experiments shows that, leveraging the prior from a pre-trained language model, PIE effectively tackles the problem of degenerate solutions and significantly outperforms all the existing SR approaches. PIE shows an encouraging step towards general scientific discovery on unsupervised data.
Problem

Research questions and friction points this paper is trying to address.

Discovering implicit equations from unsupervised scientific data
Addressing degenerate solutions in symbolic regression
Enhancing symbolic discovery with pre-trained neural models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-trained neural symbolic model for implicit equations
Formulates implicit equation discovery as translation task
Utilizes pre-trained language model to avoid degenerate solutions
🔎 Similar Papers
No similar papers found.
K
Kuang Yufei
MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China
W
Wang Jie
MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China
H
Haotong Huang
MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China
M
Mingxuan Ye
MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China
Fangzhou Zhu
Fangzhou Zhu
Noah's Ark Lab, Huawei Technologies
OptimizationLinear ProgrammingMixed Integer Programming
L
Li Xijun
Shanghai Jiao Tong University
Jianye Hao
Jianye Hao
Huawei Noah's Ark Lab/Tianjin University
Multiagent SystemsEmbodied AI
Wu Feng
Wu Feng
MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China