Physics-Learning AI Datamodel (PLAID) datasets: a collection of physics simulations for machine learning

📅 2025-05-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Surrogate models for physics-based simulation suffer from limited generalizability due to the absence of large-scale, cross-domain, standardized datasets. Method: This work introduces PLAID, a unified data modeling framework built on HDF5 and JSON Schema to define extensible, multi-physics–compatible data structures—supporting domains including structural mechanics and fluid dynamics. It provides an open-source toolchain and a Hugging Face–integrated benchmark platform enabling end-to-end data generation, parsing, and evaluation. Contribution/Results: We release six high-quality simulation datasets and report baseline results across multiple surrogate modeling methods. To our knowledge, this is the first systematic effort to establish a standardized data infrastructure for AI for Science in physics-based simulation. The framework promotes data sharing, model reproducibility, and community collaboration, thereby laying a foundational data ecosystem for surrogate model development.

Technology Category

Application Category

📝 Abstract
Machine learning-based surrogate models have emerged as a powerful tool to accelerate simulation-driven scientific workflows. However, their widespread adoption is hindered by the lack of large-scale, diverse, and standardized datasets tailored to physics-based simulations. While existing initiatives provide valuable contributions, many are limited in scope-focusing on specific physics domains, relying on fragmented tooling, or adhering to overly simplistic datamodels that restrict generalization. To address these limitations, we introduce PLAID (Physics-Learning AI Datamodel), a flexible and extensible framework for representing and sharing datasets of physics simulations. PLAID defines a unified standard for describing simulation data and is accompanied by a library for creating, reading, and manipulating complex datasets across a wide range of physical use cases (gitlab.com/drti/plaid). We release six carefully crafted datasets under the PLAID standard, covering structural mechanics and computational fluid dynamics, and provide baseline benchmarks using representative learning methods. Benchmarking tools are made available on Hugging Face, enabling direct participation by the community and contribution to ongoing evaluation efforts (huggingface.co/PLAIDcompetitions).
Problem

Research questions and friction points this paper is trying to address.

Lack of large-scale, diverse physics simulation datasets for AI
Existing datasets are limited in scope and standardization
Need flexible framework for sharing physics simulation data
Innovation

Methods, ideas, or system contributions that make the work stand out.

PLAID framework for standardized physics simulation datasets
Unified datamodel for diverse physical use cases
Open benchmarks on Hugging Face for community participation
🔎 Similar Papers
No similar papers found.
F
F. Casenave
SafranTech
X
Xavier Roynard
SafranTech
B
B. Staber
SafranTech
N
N. Akkari
SafranTech
W
William Piat
SafranTech
M
M. Bucci
SafranTech
A
A. Kabalan
SafranTech, Ecole des Ponts ParisTech (CERMICS)
X
Xuan Minh Vuong Nguyen
SafranTech
L
Luca Saverio
SafranTech, Ecole Polytechnique (CMAP), ONERA (DAAA)
R
Raphael Carpintero Perez
SafranTech
A
Anthony Kalaydjian
SafranTech, EPFL
S
Samy Fouch'e
SafranTech, ENS Paris-Saclay
T
Thierry Gonon
SafranTech
G
Ghassan Najjar
SafranTech
Emmanuel Menier
Emmanuel Menier
inria
Numerical simulationMachine Learning
M
Matthieu Nastorg
Augur, Inria
C
Christian Rey
SafranTech