🤖 AI Summary
This work addresses the limitation of existing quantum circuit routing methods, which neglect the real-time calibration status of hardware couplers, thereby degrading execution fidelity. The authors propose a graph-based reinforcement learning routing strategy that, for the first time, integrates daily calibration data into the Proximal Policy Optimization (PPO) training process to dynamically select high-fidelity hardware edges for SWAP operations. Simulations on MQT Bench benchmark circuits using IBM Heron r2 calibration data demonstrate that the proposed method substantially outperforms conventional compilers: it achieves an average exact fidelity of 0.727 on 5-qubit and 8-qubit circuit families, significantly surpassing SABRE-best20 (0.440) and target-aware SABRE (0.481), thereby transcending compilation paradigms that rely solely on gate count metrics.
📝 Abstract
Quantum circuit routing is a key step in compiling programs for noisy intermediate-scale quantum processors. Routes that appear efficient by standard overhead metrics can still lose fidelity when they pass through poorly calibrated couplers. We study a calibration-aware graph reinforcement-learning router that uses same-day IBM Heron r2 calibration data to choose hardware-edge SWAPs. We train the policy with proximal policy optimization and evaluate it with exact simulated fidelity across nine Munich Quantum Toolkit (MQT) Bench circuits and three calibration snapshots. Across these evaluations, pooled mean exact fidelity is $0.727$, compared with $0.440$ for SABRE-best20 and $0.481$ for target-aware SABRE. Fidelity gains come with higher routed two-qubit counts and are concentrated in the 5q and 8q circuit families; under the fixed tree action graph, all 10q families favor SABRE-best20. Overall, our results show that calibration-aware learned routing can improve fidelity beyond gate-count-driven compilation.