🤖 AI Summary
Existing equivariant diffusion models struggle to jointly model atomic types, fractional coordinates, and lattice parameters within a unified end-to-end framework, and cannot incorporate user-specified semantic constraints (e.g., “high electrical conductivity” or “thermal stability”) expressed in natural language. This work introduces TGDMat, the first text-guided, periodic E(3)-equivariant joint diffusion model for end-to-end generation of 3D periodic crystal structures conditioned on expert textual prompts. TGDMat innovatively integrates a periodic E(3)-equivariant graph neural network, a multivariate joint denoising mechanism, and cross-modal text–structure conditional modeling. Experiments demonstrate that TGDMat consistently outperforms state-of-the-art methods on both structure prediction and inverse design tasks; achieves optimal performance with single-sample sampling; significantly reduces computational overhead; and enhances the physical plausibility and chemical fidelity of generated crystals.
📝 Abstract
Equivariant diffusion models have emerged as the prevailing approach for generating novel crystal materials due to their ability to leverage the physical symmetries of periodic material structures. However, current models do not effectively learn the joint distribution of atom types, fractional coordinates, and lattice structure of the crystal material in a cohesive end-to-end diffusion framework. Also, none of these models work under realistic setups, where users specify the desired characteristics that the generated structures must match. In this work, we introduce TGDMat, a novel text-guided diffusion model designed for 3D periodic material generation. Our approach integrates global structural knowledge through textual descriptions at each denoising step while jointly generating atom coordinates, types, and lattice structure using a periodic-E(3)-equivariant graph neural network (GNN). Extensive experiments using popular datasets on benchmark tasks reveal that TGDMat outperforms existing baseline methods by a good margin. Notably, for the structure prediction task, with just one generated sample, TGDMat outperforms all baseline models, highlighting the importance of text-guided diffusion. Further, in the generation task, TGDMat surpasses all baselines and their text-fusion variants, showcasing the effectiveness of the joint diffusion paradigm. Additionally, incorporating textual knowledge reduces overall training and sampling computational overhead while enhancing generative performance when utilizing real-world textual prompts from experts.