🤖 AI Summary
To address the unreliability of general-purpose large language models (LLMs) in quantum computing—stemming from scarce domain-specific training data—this paper proposes C4Q 2.0, a trustworthy dialogue system tailored for quantum computing. Methodologically, it introduces a novel “classification-execution” decoupled architecture: a domain-specialized fine-tuned LLM achieves near-perfect intent classification (~100% accuracy), while a deterministic logic engine—integrating Qiskit and symbolic reasoning—handles quantum circuit synthesis and algorithmic problem solving, ensuring verifiable, maintainable outputs. The system supports plug-and-play quantum gate definitions and interactive circuit manipulation, and successfully solves software-engineering-scale problems such as the Traveling Salesman Problem (TSP) and the Knapsack Problem. Comparative experiments demonstrate that C4Q 2.0 significantly outperforms three state-of-the-art quantum chat tools in accuracy, robustness, and interpretability.
📝 Abstract
Large language model (LLM)-based tools such as ChatGPT seem useful for classical programming assignments. The more specialized the field, the more likely they lack reliability because of the lack of data to train them. In the case of quantum computing, the quality of answers of generic chatbots is low. C4Q is a chatbot focused on quantum programs that addresses this challenge through a software architecture that integrates specialized LLMs to classify requests and specialized question answering modules with a deterministic logical engine to provide trustworthy quantum computing support. This article describes the latest version (2.0) of C4Q, which delivers several enhancements: ready-to-run Qiskit code for gate definitions and circuit operations, expanded features to solve software engineering tasks such as the travelling salesperson problem and the knapsack problem, and a feedback mechanism for iterative improvement. Extensive testing of the backend confirms the system's reliability, while empirical evaluations show that C4Q 2.0's classification LLM reaches near-perfect accuracy. The evaluation of the result consists in a comparative study with three existing chatbots highlighting C4Q 2.0's maintainability and correctness, reflecting on how software architecture decisions, such as separating deterministic logic from probabilistic text generation impact the quality of the results.