🤖 AI Summary
Current large language models (LLMs) for crystal generation support only zero-shot inference, lacking mechanisms for few-shot generation—i.e., leveraging a small number of target-property examples to guide structural design. To address this gap, we propose the first context-learning-enabled few-shot crystal generation framework. Our approach introduces three key innovations: (1) a space-group-aware crystal tokenization scheme that enhances symmetry-preserving structural representation; (2) a conditional-structure-aware hybrid instruction fine-tuning framework; and (3) a multi-task instruction optimization strategy. Evaluated on four standard crystal generation benchmarks, our method achieves significant improvements over state-of-the-art methods in both conditional and unconditional generation settings. Notably, it is the first to enable example-driven, controllable, and efficient crystal structure modeling—demonstrating robust generalization from limited property-structure exemplars while preserving physical validity and symmetry constraints.
📝 Abstract
Designing crystal materials with desired physicochemical properties remains a fundamental challenge in materials science. While large language models (LLMs) have demonstrated strong in-context learning (ICL) capabilities, existing LLM-based crystal generation approaches are limited to zero-shot scenarios and are unable to benefit from few-shot scenarios. In contrast, human experts typically design new materials by modifying relevant known structures which aligns closely with the few-shot ICL paradigm. Motivated by this, we propose CrystalICL, a novel model designed for few-shot crystal generation. Specifically, we introduce a space-group based crystal tokenization method, which effectively reduces the complexity of modeling crystal symmetry in LLMs. We further introduce a condition-structure aware hybrid instruction tuning framework and a multi-task instruction tuning strategy, enabling the model to better exploit ICL by capturing structure-property relationships from limited data. Extensive experiments on four crystal generation benchmarks demonstrate the superiority of CrystalICL over the leading baseline methods on conditional and unconditional generation tasks.