🤖 AI Summary
This work addresses the high computational cost of generalized eigenvalue problems in large-scale density functional theory calculations on exascale architectures by introducing a data-driven framework that reformulates spectral prediction as a regression task for coefficients of Chebyshev interpolating polynomials. Combining all-atom and fragmented structural representations, the approach leverages kernel ridge regression, graph neural networks, and random forest models trained on a 2 TB dataset of protein dimers. Integrated into the BigDFT software package, it provides high-quality initial guesses that significantly accelerate the early stages of self-consistent field iterations. This innovation overcomes the dimensional limitations of conventional methods and establishes a foundation for dynamic optimization of rational filter eigensolvers such as FrASE.
📝 Abstract
Simulating large molecular systems comprising thousands of atoms requires highly scalable methodologies. While modern Density Functional Theory (DFT) codes exhibit linear scaling, solving the associated large, sparse generalized eigenproblems remains a critical computational bottleneck on exascale architectures. In the context of the LimitX project, we propose a data-driven framework to accelerate these calculations. By shifting the machine learning target from discrete eigenvalues to the coefficients of an interpolating Chebyshev polynomial, and by comparing both all-atom and fragment-based structural representations, we successfully overcome the dimensionality constraints of large-scale spectral prediction. We investigate three machine learning models (Kernel Ridge Regression, Graph Neural Networks, and Random Forests) trained on a novel 2 TB dataset of protein dimers. The predicted spectra provide initial guesses that effectively bypass early Self-Consistent Field (SCF) iterations in BigDFT. Ultimately, these spectral predictors will be deployed to dynamically optimize upcoming rational filter-based eigensolvers, such as FrASE, which is currently in initial development.