π€ AI Summary
Fault-tolerant quantum computing demands decoders that balance high logical accuracy with ultra-low latencyβa trade-off existing approaches struggle to achieve. This work proposes a co-designed algorithm-hardware quantum error correction decoding scheme that introduces, for the first time, a coset ensemble decoding mechanism. It integrates coset-consistent candidate generation, ensemble forest exploration, reverse-order elimination, and lossless graph compression to enhance decoding accuracy while reducing algorithmic complexity. Concurrently, a time-multiplexed hardware architecture is developed, featuring multi-bank memory hashing and hierarchical ID mapping to prevent linear resource scaling with code distance. Evaluated under a circuit-level depolarizing noise model, the proposed approach significantly improves the accuracy-latency trade-off compared to MWPM and Union-Find decoders, achieving up to an 8.2Γ reduction in FPGA LUT resource consumption.
π Abstract
Reliable large-scale quantum computation relies on fault-tolerant architectures, where quantum error correction (QEC) continuously extracts and decodes error syndromes in real time. A critical component in QEC is the decoder, a classical subsystem that must simultaneously deliver high logical accuracy and ultra-low latency. This paper presents a novel algorithm-hardware co-design that improves the accuracy-latency trade-off over existing approaches such as vanilla Minimum-Weight Perfect Matching (MWPM) and Union-Find (UF) decoders. At the algorithmic level, we introduce coset ensemble decoding, which improves UF decoding by explicitly exploiting logically equivalent cosets. Our method performs ensemble forest exploration to generate multiple coset-consistent candidates and aggregates them to approximate coset-level maximum-likelihood decoding. We further reduce computational and memory complexity via reverse-order elimination and lossless graph compression, without sacrificing accuracy. At the hardware level, we design a domain-specific architecture that temporally reuses resources, avoiding the code-distance-proportional resource growth in prior spatial architectures. Several optimizations, such as multi-bank memory hashing and hierarchical ID mapping, are proposed to mitigate pipeline stalls and memory conflicts under highly concurrent access patterns. Under a circuit-level depolarizing noise model, our co-design approach achieves a better accuracy-latency trade-off than prior MWPM- and UF-based decoders, while reducing FPGA LUT consumption by up to 8.2 times compared with reported UF-based decoder resources. The tunable candidate number further exposes a flexible design knob, enabling users to tailor decoding performance to the requirements of different fault-tolerant workloads. Our implementation is publicly available at https://github.com/IMSeonL/coset-ensemble-decoder.