Question Answering for Multi-Release Systems: A Case Study at Ciena

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the challenge of reduced question-answering accuracy in multi-version software systems, where documentation across versions is highly similar yet contains subtle differences that confuse existing QA systems. To tackle this issue, the authors propose QAMR, a novel chatbot that introduces a retrieval-augmented generation (RAG) framework specifically tailored for multi-version documentation. The framework incorporates a dual-chunking strategy—optimizing chunks separately for retrieval and generation—along with query rewriting and context selection mechanisms. Evaluated on both real-world industrial data and public benchmarks, QAMR achieves a question-answering accuracy of 88.5% and a retrieval accuracy of 90%, representing improvements of 16.5% and 12% over baseline methods, respectively, while also reducing response time by 8%.

Technology Category

Application Category

📝 Abstract

Companies regularly have to contend with multi-release systems, where several versions of the same software are in operation simultaneously. Question answering over documents from multi-release systems poses challenges because different releases have distinct yet overlapping documentation. Motivated by the observed inaccuracy of state-of-the-art question-answering techniques on multi-release system documents, we propose QAMR, a chatbot designed to answer questions across multi-release system documentation. QAMR enhances traditional retrieval-augmented generation (RAG) to ensure accuracy in the face of highly similar yet distinct documentation for different releases. It achieves this through a novel combination of pre-processing, query rewriting, and context selection. In addition, QAMR employs a dual-chunking strategy to enable separately tuned chunk sizes for retrieval and answer generation, improving overall question-answering accuracy. We evaluate QAMR using a public software-engineering benchmark as well as a collection of real-world, multi-release system documents from our industry partner, Ciena. Our evaluation yields five main findings: (1) QAMR outperforms a baseline RAG-based chatbot, achieving an average answer correctness of 88.5% and an average retrieval accuracy of 90%, which correspond to improvements of 16.5% and 12%, respectively. (2) An ablation study shows that QAMR's mechanisms for handling multi-release documents directly improve answer accuracy. (3) Compared to its component-ablated variants, QAMR achieves a 19.6% average gain in answer correctness and a 14.0% average gain in retrieval accuracy over the best ablation. (4) QAMR reduces response time by 8% on average relative to the baseline. (5) The automatically computed accuracy metrics used in our evaluation strongly correlate with expert human assessments, validating the reliability of our methodology.

Problem

Research questions and friction points this paper is trying to address.

multi-release systems

question answering

software documentation

versioned documentation

retrieval-augmented generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-release systems

retrieval-augmented generation

dual-chunking