Large Scale Finite-Temperature Real-time Time Dependent Density Functional Theory Calculation with Hybrid Functional on ARM and GPU Systems

📅 2025-01-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Efficiently simulating ultrafast, non-equilibrium electron dynamics—such as those induced by nonlinear optical excitations—in real materials at finite temperature remains a major challenge for real-time time-dependent density functional theory (rt-TDDFT), which is traditionally restricted to zero-temperature insulators. Method: This work overcomes this limitation by enabling large-scale, real-time rt-TDDFT simulations with hybrid functionals at finite temperature. We propose the novel Parallel Transport–Implicit Midpoint (PT-IM) algorithm, integrated with Adaptive Compression of Exchange (ACE), customized diagonalization acceleration, toroidal communication, and GPU/ARM heterogeneous optimization. Contribution/Results: Implemented within the plane-wave code PWDFT, our approach achieves the first hybrid-functional, finite-temperature rt-TDDFT simulation of a 3072-atom system on 192 nodes, with 429.3 seconds per time step—41.4× faster than the baseline. This represents the largest and most efficient implementation of its kind to date.

Technology Category

Application Category

📝 Abstract
Ultra-fast electronic phenomena originating from finite temperature, such as nonlinear optical excitation, can be simulated with high fidelity via real-time time dependent density functional theory (rt-TDDFT) calculations with hybrid functional. However, previous rt-TDDFT simulations of real materials using the optimal gauge--known as the parallel transport gauge--have been limited to low-temperature systems with band gaps. In this paper, we introduce the parallel transport-implicit midpoint (PT-IM) method, which significantly accelerates finite-temperature rt-TDDFT calculations of real materials with hybrid function. We first implement PT-IM with hybrid functional in our plane wave code PWDFT, and optimized it on both GPU and ARM platforms to build a solid baseline code. Next, we propose a diagonalization method to reduce computation and communication complexity, and then, we employ adaptively compressed exchange (ACE) method to reduce the frequency of the most expensive Fock exchange operator. Finally, we adopt the ring_based method and the shared memory mechanism to overlap computation and communication and alleviate memory consumption respectively. Numerical results show that our optimized code can reach 3072 atoms for rt-TDDFT simulation with hybrid functional at finite temperature on 192 computing nodes, the time-to-solution for one time step is 429.3s, which is 41.4 times faster compared to the baseline.
Problem

Research questions and friction points this paper is trying to address.

Real-time Time-Dependent Density Functional Theory
Finite Temperature
Ultrafast Electronic Phenomena
Innovation

Methods, ideas, or system contributions that make the work stand out.

PT-IM method
rt-TDDFT acceleration
finite temperature simulation
🔎 Similar Papers
No similar papers found.
R
Rongrong Liu
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China
Z
Zhuoqiang Guo
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China
Q
Qiuchen Sha
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China
T
Tong Zhao
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China
Haibo Li
Haibo Li
professor of media technology, KTH
Artificial IntelligenceComputer VisionVideo CodingSignal Processing
W
Wei Hu
School of Computer Science and Technology, University of Science and Technology of China, Hefei, China
L
Lijun Liu
Department of Mechanical Engineering, Graduate School of Engineering, Osaka University, Osaka, Japan
G
Guangming Tan
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China
W
Weile Jia
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China