HLTCOE at LiveRAG: GPT-Researcher using ColBERT retrieval

📅 2025-06-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses open-domain question answering via a retrieval-augmented generation (RAG) framework that synergistically integrates ColBERT’s multi-vector retrieval with GPT-Researcher–style progressive reasoning. Methodologically, it employs a ColBERT dual-encoder coupled with the PLAID-X compressed index for efficient multilingual retrieval; applies a lightweight m2-bert-80M-8k-retrieval model to filter noisy passages; refines queries using Qwen2.5-7B-Instruct; and generates final answers with Falcon3-10B conditioned on up to nine high-precision retrieved passages. The key contribution is an end-to-end, scalable, multi-model collaborative RAG pipeline that jointly optimizes retrieval accuracy, computational efficiency, and generation robustness. Evaluated on LiveRAG’s automated benchmark, the system achieves fifth place in correctness (score: 1.07), demonstrating strong real-world effectiveness and competitiveness.

Technology Category

Application Category

📝 Abstract
The HLTCOE LiveRAG submission utilized the GPT-researcher framework for researching the context of the question, filtering the returned results, and generating the final answer. The retrieval system was a ColBERT bi-encoder architecture, which represents a passage with many dense tokens. Retrieval used a local, compressed index of the FineWeb10-BT collection created with PLAID-X, using a model fine-tuned for multilingual retrieval. Query generation from context was done with Qwen2.5-7B-Instruct, while filtering was accomplished with m2-bert-80M-8k-retrieval. Up to nine passages were used as context to generate an answer using Falcon3-10B. This system placed 5th in the LiveRAG automatic evaluation for correctness with a score of 1.07.
Problem

Research questions and friction points this paper is trying to address.

Improving question context research using GPT-Researcher framework
Enhancing retrieval with ColBERT bi-encoder dense token architecture
Optimizing multilingual retrieval with fine-tuned PLAID-X index
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPT-researcher framework for context research
ColBERT bi-encoder for dense token retrieval
Qwen2.5-7B-Instruct for query generation
🔎 Similar Papers
No similar papers found.