🤖 AI Summary
Efficiently simulating ultrafast, non-equilibrium electron dynamics—such as those induced by nonlinear optical excitations—in real materials at finite temperature remains a major challenge for real-time time-dependent density functional theory (rt-TDDFT), which is traditionally restricted to zero-temperature insulators.
Method: This work overcomes this limitation by enabling large-scale, real-time rt-TDDFT simulations with hybrid functionals at finite temperature. We propose the novel Parallel Transport–Implicit Midpoint (PT-IM) algorithm, integrated with Adaptive Compression of Exchange (ACE), customized diagonalization acceleration, toroidal communication, and GPU/ARM heterogeneous optimization.
Contribution/Results: Implemented within the plane-wave code PWDFT, our approach achieves the first hybrid-functional, finite-temperature rt-TDDFT simulation of a 3072-atom system on 192 nodes, with 429.3 seconds per time step—41.4× faster than the baseline. This represents the largest and most efficient implementation of its kind to date.
📝 Abstract
Ultra-fast electronic phenomena originating from finite temperature, such as nonlinear optical excitation, can be simulated with high fidelity via real-time time dependent density functional theory (rt-TDDFT) calculations with hybrid functional. However, previous rt-TDDFT simulations of real materials using the optimal gauge--known as the parallel transport gauge--have been limited to low-temperature systems with band gaps. In this paper, we introduce the parallel transport-implicit midpoint (PT-IM) method, which significantly accelerates finite-temperature rt-TDDFT calculations of real materials with hybrid function. We first implement PT-IM with hybrid functional in our plane wave code PWDFT, and optimized it on both GPU and ARM platforms to build a solid baseline code. Next, we propose a diagonalization method to reduce computation and communication complexity, and then, we employ adaptively compressed exchange (ACE) method to reduce the frequency of the most expensive Fock exchange operator. Finally, we adopt the ring_based method and the shared memory mechanism to overlap computation and communication and alleviate memory consumption respectively. Numerical results show that our optimized code can reach 3072 atoms for rt-TDDFT simulation with hybrid functional at finite temperature on 192 computing nodes, the time-to-solution for one time step is 429.3s, which is 41.4 times faster compared to the baseline.