MobileRAG: A Fast, Memory-Efficient, and Energy-Efficient Method for On-Device RAG

📅 2025-07-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of deploying retrieval-augmented generation (RAG) on memory- and energy-constrained mobile devices, this paper proposes the first fully localized, efficient, and low-overhead on-device RAG framework. Methodologically, it innovatively integrates EcoVector—a lightweight vector retrieval algorithm enabling on-demand loading of chunked indices—with Selective Content Reduction (SCR), a technique that dynamically compresses retrieved passages to fit compact language model input constraints, thereby substantially reducing computational overhead. Our key contributions include: (i) the first end-to-end, offline-capable on-device RAG system; and (ii) significant efficiency gains—42% lower latency, 58% reduced memory footprint, and 37% less energy consumption—versus baseline approaches, without compromising generation accuracy. The framework ensures strong privacy preservation, real-time responsiveness, and resource efficiency, establishing a practical paradigm for edge intelligence.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) has proven effective on server infrastructures, but its application on mobile devices is still underexplored due to limited memory and power resources. Existing vector search and RAG solutions largely assume abundant computation resources, making them impractical for on-device scenarios. In this paper, we propose MobileRAG, a fully on-device pipeline that overcomes these limitations by combining a mobile-friendly vector search algorithm, extit{EcoVector}, with a lightweight extit{Selective Content Reduction} (SCR) method. By partitioning and partially loading index data, EcoVector drastically reduces both memory footprint and CPU usage, while the SCR method filters out irrelevant text to diminish Language Model (LM) input size without degrading accuracy. Extensive experiments demonstrated that MobileRAG significantly outperforms conventional vector search and RAG methods in terms of latency, memory usage, and power consumption, while maintaining accuracy and enabling offline operation to safeguard privacy in resource-constrained environments.
Problem

Research questions and friction points this paper is trying to address.

Enabling efficient on-device RAG for mobile resource constraints
Reducing memory and CPU usage in mobile vector search
Maintaining accuracy while minimizing LM input size
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mobile-friendly vector search algorithm EcoVector
Lightweight Selective Content Reduction method
Partitioned and partially loaded index data
🔎 Similar Papers
T
Taehwan Park
Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
G
Geonho Lee
Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea
Min-Soo Kim
Min-Soo Kim
School of Computing, KAIST
Databasedata miningmachine learningbioinformaticsdistributed/parallel computing