🤖 AI Summary
Current web-based LLM chatbots predominantly rely on proprietary solutions, resulting in high deployment barriers, low transparency, and unclear energy efficiency. This paper introduces Talk2X, an open-source toolkit enabling rapid integration of lightweight, auditable chatbots into arbitrary websites. Our approach addresses these limitations through three core contributions: (1) a novel, automated vector store construction mechanism tailored to webpage content; (2) a lightweight RAG architecture that maintains high retrieval accuracy while significantly reducing computational overhead; and (3) a hybrid usability evaluation framework. Experiments in an open-science repository setting demonstrate that Talk2X reduces average user task completion time by 37% and improves answer accuracy by 22%. Moreover, it exhibits strong cross-site generalizability and superior energy efficiency. Talk2X thus fills a critical gap in the ecosystem—providing an efficient, transparent, and reproducible open-source Web RAG solution.
📝 Abstract
Integrated into websites, LLM-powered chatbots offer alternative means of navigation and information retrieval, leading to a shift in how users access information on the web. Yet, predominantly closed-sourced solutions limit proliferation among web hosts and suffer from a lack of transparency with regard to implementation details and energy efficiency. In this work, we propose our openly available agent Talk2X leveraging an adapted retrieval-augmented generation approach (RAG) combined with an automatically generated vector database, benefiting energy efficiency. Talk2X's architecture is generalizable to arbitrary websites offering developers a ready to use tool for integration. Using a mixed-methods approach, we evaluated Talk2X's usability by tasking users to acquire specific assets from an open science repository. Talk2X significantly improved task completion time, correctness, and user experience supporting users in quickly pinpointing specific information as compared to standard user-website interaction. Our findings contribute technical advancements to an ongoing paradigm shift of how we access information on the web.