Physics-Learning AI Datamodel (PLAID) datasets: a collection of physics simulations for machine learning

📅 2025-05-05

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Surrogate models for physics-based simulation suffer from limited generalizability due to the absence of large-scale, cross-domain, standardized datasets. Method: This work introduces PLAID, a unified data modeling framework built on HDF5 and JSON Schema to define extensible, multi-physics–compatible data structures—supporting domains including structural mechanics and fluid dynamics. It provides an open-source toolchain and a Hugging Face–integrated benchmark platform enabling end-to-end data generation, parsing, and evaluation. Contribution/Results: We release six high-quality simulation datasets and report baseline results across multiple surrogate modeling methods. To our knowledge, this is the first systematic effort to establish a standardized data infrastructure for AI for Science in physics-based simulation. The framework promotes data sharing, model reproducibility, and community collaboration, thereby laying a foundational data ecosystem for surrogate model development.

Technology Category

Application Category

📝 Abstract

Machine learning-based surrogate models have emerged as a powerful tool to accelerate simulation-driven scientific workflows. However, their widespread adoption is hindered by the lack of large-scale, diverse, and standardized datasets tailored to physics-based simulations. While existing initiatives provide valuable contributions, many are limited in scope-focusing on specific physics domains, relying on fragmented tooling, or adhering to overly simplistic datamodels that restrict generalization. To address these limitations, we introduce PLAID (Physics-Learning AI Datamodel), a flexible and extensible framework for representing and sharing datasets of physics simulations. PLAID defines a unified standard for describing simulation data and is accompanied by a library for creating, reading, and manipulating complex datasets across a wide range of physical use cases (gitlab.com/drti/plaid). We release six carefully crafted datasets under the PLAID standard, covering structural mechanics and computational fluid dynamics, and provide baseline benchmarks using representative learning methods. Benchmarking tools are made available on Hugging Face, enabling direct participation by the community and contribution to ongoing evaluation efforts (huggingface.co/PLAIDcompetitions).

Problem

Research questions and friction points this paper is trying to address.

Lack of large-scale, diverse physics simulation datasets for AI

Existing datasets are limited in scope and standardization

Need flexible framework for sharing physics simulation data

Innovation

Methods, ideas, or system contributions that make the work stand out.

PLAID framework for standardized physics simulation datasets

Unified datamodel for diverse physical use cases

Open benchmarks on Hugging Face for community participation

🔎 Similar Papers

Machine Learning with Physics Knowledge for Prediction: A Survey