QBugLM: An Agentic Benchmarking Framework for LLM-based Quantum Software Debugging

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of detecting silent bugs in quantum software, which are often elusive to traditional debugging methods and lack systematic evaluation when using large language models (LLMs). The authors propose QBugLM, a multi-agent framework that introduces the first LLM benchmark specifically designed for quantum program debugging. Supporting OpenQASM 3.0, the framework enables automated bug injection, detection, repair, and simulation-based validation. It integrates a taxonomy-driven bug generation approach, diverse LLM prompting strategies—including Chain-of-Thought and ReAct—and an iterative feedback mechanism. Experimental results demonstrate that, under resource-constrained settings, structurally simple prompts outperform complex reasoning paradigms, and a single retry iteration boosts Pass@1 accuracy from below 25% to over 80%.

📝 Abstract

Quantum software bugs often yield silent, incorrect outputs rather than explicit errors, making them particularly difficult to detect and repair with conventional techniques. Although large language models (LLMs) have shown strong performance on classical software engineering tasks, their ability to debug quantum code remains largely unexplored. To bridge this gap, we propose QBugLM, a multi-agent framework that automates the quantum software debugging pipeline, from taxonomy-driven bug injection to LLM-based detection and repair, and finally to simulation-based validation, for framework-agnostic OpenQASM 3.0 programs. We further conduct a comprehensive case study using QBugLM to benchmark two LLMs, Claude 4.6 Sonnet and Qwen3 Coder Next, across different prompting strategies, bug categories, and quantum programs. Our results show that iterative feedback is critical, as a single retry raises Pass@1 from below 25% to above 80%. Moreover, simpler structured prompting can even outperform Chain-of-Thought and ReAct for reasoning-capable models under fixed-resource constraints. Our work takes initial steps toward benchmarking LLM capabilities for debugging quantum programs and offers practical insights to support future efforts in automated quantum software repair.

Problem

Research questions and friction points this paper is trying to address.

quantum software debugging

silent bugs

large language models

OpenQASM

automated repair

Innovation

Methods, ideas, or system contributions that make the work stand out.

quantum software debugging

large language models

multi-agent framework