🤖 AI Summary
This work proposes a data-driven correction strategy to mitigate the significant errors introduced by tensor hypercontraction (THC) approximations in third-order Møller–Plesset perturbation theory (MP3) calculations of dynamic correlation energies. For the first time, machine learning models—specifically nonlinear kernel ridge regression—are systematically employed to correct THC-MP3 energy errors using the main-group chemistry database MGCDB84. The approach combines both absolute and relative correction schemes, substantially improving accuracy: root-mean-square errors in total energies are reduced by a factor of 6–9, and reaction energy errors are lowered by 2–3 times. By achieving this enhanced precision without sacrificing computational efficiency, the method offers a promising pathway toward the practical deployment of THC-based quantum chemical methods.
📝 Abstract
Wavefunction-based quantum methods are some of the most accurate tools for predicting and analyzing the electronic structure of molecules, in particular for accounting for dynamical electron correlation. However, most methods of including dynamical correlation beyond the simple second-order M{\o}ller-Plesset perturbation theory (MP2) level are too computationally expensive to apply to large molecules. Approximations which reduce scaling with system size are a potential remedy, such as the tensor hyper-contraction (THC) technique of Hohenstein et al., but also result in additional sources of error. In this work, we correct errors in THC-approximated methods using machine learning. Specifically, we apply THC to third-order M{\o}ller-Plesset theory (MP3) as a simplified model for coupled cluster with single and double excitations (CCSD), and train several regression models on observed THC errors from the Main Group Chemistry Database (MGCDB84). We compare performance of multiple linear regression models and non-linear Kernel Ridge regression models. We also investigate correlation procedures using absolute and relative corrections and evaluate the corrections for both molecule and reaction energies. We discuss the potential for using regression techniques to correct THC-MP3 errors by comparing it to the"canonical"MP3 reference values and find the optimum technique based on accuracy. We find that non-linear regression models reduced root mean squared errors between THC- and canonical MP3 by a factor of 6-9$\times$ for total molecular energies and 2-3$\times$ for reaction energies.