🤖 AI Summary
Understanding how neural network performance scales with training data size, model capacity, and computational resources remains critical for efficient materials property prediction.
Method: We conduct large-scale ablation studies on multitask materials datasets using Transformer and EquiformerV2 architectures, employing mixed-precision training and adaptive learning rate scheduling.
Contribution/Results: We empirically establish, for the first time in materials modeling, a data-driven power-law scaling relationship—(L = alpha cdot N^{-eta})—between validation loss (L) and training dataset size (N). This law quantitatively characterizes the diminishing returns in predictive performance as data volume increases. The derived scaling law provides a reproducible, generalizable theoretical foundation and practical framework for optimizing training resource allocation, guiding large-model architecture design, and accelerating discovery of novel materials.
📝 Abstract
Predicting material properties is crucial for designing better batteries, semiconductors, and medical devices. Deep learning helps scientists quickly find promising materials by predicting their energy, forces, and stresses. Companies scale capacities of deep learning models in multiple domains, such as language modeling, and invest many millions of dollars into such models. Our team analyzes how scaling training data (giving models more information to learn from), model sizes (giving models more capacity to learn patterns), and compute (giving models more computational resources) for neural networks affects their performance for material property prediction. In particular, we trained both transformer and EquiformerV2 neural networks to predict material properties. We find empirical scaling laws for these models: we can predict how increasing each of the three hyperparameters (training data, model size, and compute) affects predictive performance. In particular, the loss $L$ can be measured with a power law relationship $L = αcdot N^{-β}$, where $α$ and $β$ are constants while $N$ is the relevant hyperparameter. We also incorporate command-line arguments for changing training settings such as the amount of epochs, maximum learning rate, and whether mixed precision is enabled. Future work could entail further investigating scaling laws for other neural network models in this domain, such as GemNet and fully connected networks, to assess how they compare to the models we trained.