🤖 AI Summary
This work addresses the limitations of existing academic retrieval platforms, which are predominantly designed for offline batch processing and thus ill-suited for the dynamic, multi-turn online retrieval demands of Retrieval-Augmented Generation (RAG) systems. To bridge this gap, we propose a lightweight Python service framework that wraps arbitrary retrieval methods via an HTTP API and supports dynamic composition of modules such as query expansion, re-ranking, and result fusion. The framework incorporates built-in asynchronous batching and caching mechanisms, offering the first general-purpose, composable, and low-latency online retrieval pipeline tailored for RAG. By leveraging a modular engine abstraction and JSON-driven configuration, it enables plug-and-play integration of diverse state-of-the-art retrieval models, significantly enhancing flexibility and response efficiency in complex interactive scenarios—including iterative reasoning, feedback loops, and agent collaboration. The implementation is publicly released.
📝 Abstract
Retrieval models are key components of Retrieval-Augmented Generation (RAG) systems, which generate search queries, process the documents returned, and generate a response. RAG systems are often dynamic and may involve multiple rounds of retrieval. While many state-of-the-art retrieval methods are available through academic IR platforms, these platforms are typically designed for the Cranfield paradigm in which all queries are known up front and can be batch processed offline. This simplification accelerates research but leaves state-of-the-art retrieval models unable to support downstream applications that require online services, such as arbitrary dynamic RAG pipelines that involve looping, feedback, or even self-organizing agents. In this work, we introduce RoutIR, a Python package that provides a simple and efficient HTTP API that wraps arbitrary retrieval methods, including first stage retrieval, reranking, query expansion, and result fusion. By providing a minimal JSON configuration file specifying the retrieval models to serve, RoutIR can be used to construct and query retrieval pipelines on-the-fly using any permutation of available models (e.g., fusing the results of several first-stage retrieval methods followed by reranking). The API automatically performs asynchronous query batching and caches results by default. While many state-of-the-art retrieval methods are already supported by the package, RoutIR is also easily expandable by implementing the Engine abstract class. The package is open-sourced and publicly available on GitHub: http://github.com/hltcoe/routir.