How Low Can You Go? Active Learning for Sparse Model Discovery in the Ultra-Low-Data Limit

📅 2026-06-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of accurately identifying governing equations of complex dynamical systems under extreme data scarcity and high acquisition costs. The authors propose a novel active learning strategy that integrates active sampling with sparse dynamics discovery for the first time. By leveraging ensemble SINDy (E-SINDy) to quantify epistemic uncertainty, the method iteratively selects and acquires new measurements in the most informative regions of the state space. This approach substantially enhances the efficiency of discovering governing ordinary and partial differential equations under severely limited data budgets. Validation on benchmark systems—including Lorenz, Burgers’, and Kuramoto–Sivashinsky—demonstrates that the method accurately recovers the true equations using significantly fewer data points than random sampling, even across varying noise levels.

📝 Abstract

Identifying the governing equations of complex dynamical systems remains a fundamental challenge across science and engineering. While early approaches relied on empirical data and heuristics, modern data-driven methods offer greater flexibility and fewer assumptions. However, data acquisition in real-world settings is often expensive. This work addresses this challenge by introducing an active learning strategy for dynamics discovery in the ultra-low data limit. Rather than sampling randomly, our method iteratively prioritizes regions that are most informative for model identification. This approach builds on Sparse Identification of Nonlinear Dynamics (SINDy), and utilizes an ensemble extension, E-SINDy, to estimate epistemic uncertainty and guide the sampling for both ordinary and partial differential equations (ODEs/PDEs). For ODEs, an exhaustive analysis is conducted on the Lorenz system across varying data budgets and noise levels. For PDEs, two systems with contrasting dynamical characteristics are examined: the Burgers' equation, where a sharp shock front creates a distinction between informative and uninformative regions, and the Kuramoto-Sivashinsky equation, which presents a more spatially complex sampling landscape. Across all scenarios, the proposed method accurately identifies the governing dynamics with significantly fewer data samples than random sampling.

Problem

Research questions and friction points this paper is trying to address.

active learning

sparse model discovery

ultra-low-data limit

dynamics discovery

governing equations

Innovation

Methods, ideas, or system contributions that make the work stand out.

active learning

SINDy

ultra-low-data