🤖 AI Summary
This work addresses the challenge of crystal property prediction, which is hindered by scarce labeled data and limited model generalization. The authors propose CrysLDNet, a novel framework that introduces latent-space diffusion pretraining to this domain for the first time. By jointly pretraining a variational autoencoder and a latent diffusion model on large-scale unlabeled crystal structures, the method learns rich structural and chemical semantic representations. These representations are subsequently fine-tuned with a graph neural network for downstream tasks. CrysLDNet substantially alleviates data scarcity, outperforming existing baselines by 4.26% and 4.90% on the JARVIS and Materials Project datasets, respectively. Moreover, it demonstrates robust performance under low-data regimes and in scenarios requiring experimental correction, effectively leveraging minimal experimental data to calibrate DFT-predicted errors.
📝 Abstract
Fast and accurate prediction of crystal properties is a central challenge in new materials design. Graph neural networks and Transformer-based models have emerged as powerful tools for this task due to their ability to encode the local structural environment of atoms within a crystal. However, these models are data-hungry, and in practice, labeled data for crystal properties are scarce. Pretraining-finetuning strategies, particularly those based on diffusion models, have shown promise in addressing these limitations. In this work, we introduce a novel latent diffusion based pretraining framework, CrysLDNet, designed to mitigate data scarcity. Our approach integrates a Variational Autoencoder (VAE) with a diffusion model during the pretraining stage. The VAE encoder maps 3D crystal structures into a smooth latent space within which the diffusion process is applied. This latent diffusion pretraining enables the graph encoder to effectively capture structural and chemical semantics from large-scale unlabeled data, which can then be finetuned for specific property prediction tasks. Comprehensive experiments on popular DFT datasets for property prediction reveal that CrysLDNet significantly outperforms both training-from-scratch and pretrained baselines, with improvements of 4.26% and 4.90% on the JARVIS and MP datasets, respectively. Additionally, the learned representations remain robust in sparse-data conditions and are expressive enough to correct DFT errors when finetuned with limited experimental data. Code is available at: https://github.com/shrimonmuke0202/CrysLDNet.git.