🤖 AI Summary
This work addresses the longstanding challenge of balancing accuracy and efficiency in SO(3)-equivariant atomistic models. The authors propose a structured pruning method that preserves SO(3) equivariance by compressing pretrained large-scale models along both channel and representation order dimensions. Their approach enables, for the first time, holistic pruning of high-order tensor channels while retaining complete irreducible representation blocks to maintain symmetry. This strategy is compatible with other compression techniques. Evaluated on Matbench Discovery, the pruned MACE-MP model outperforms the official small model on seven out of nine tasks, achieving 1.5–4× fewer parameters and 2.5–4× lower pretraining computational cost. After fine-tuning, it reduces energy and force prediction errors by 70.1% and 34.4%, respectively.
📝 Abstract
SO(3) equivariant graph neural networks have become the dominant paradigm for atomistic foundation models, achieving high accuracy and data efficiency by building rotational symmetry directly into the architecture. Yet the computational cost of their higher-order tensor operations creates a tough trade-off between model accuracy and inference efficiency. In this paper, we propose a structural pruning method for SO(3) equivariant atomistic foundation models to bridge this accuracy-efficiency gap. The pruning is applied along the channel and order dimensions, with each irreducible representation kept or removed as a complete block, thereby retaining SO(3) equivariance. Starting from a large checkpoint, the pruned model substantially reduces the inference cost while retaining higher accuracy than an independently trained small model. The pruned MACE-MP model outperforms the official from-scratch trained small model on 7 of 9 metrics on the Matbench Discovery leaderboard. In terms of efficiency, compressed MACE-MP and MACE-OFF models contain 1.5$\times$ to 4$\times$ fewer parameters and require 2.5$\times$ to 4$\times$ less pre-training compute than training a small model from scratch. For downstream applications, fine-tuning the pruned model reduces energy and force errors by 70.1% and 34.4% compared to training task-specific models from scratch across eight representative downstream datasets. We demonstrate that the method generalizes to other SO(3) equivariant architectures (SevenNet, eSCN) and can be combined with quantization and knowledge distillation for further gains.