MobileRAG: A Fast, Memory-Efficient, and Energy-Efficient Method for On-Device RAG

📅 2025-07-01

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

To address the challenges of deploying retrieval-augmented generation (RAG) on memory- and energy-constrained mobile devices, this paper proposes the first fully localized, efficient, and low-overhead on-device RAG framework. Methodologically, it innovatively integrates EcoVector—a lightweight vector retrieval algorithm enabling on-demand loading of chunked indices—with Selective Content Reduction (SCR), a technique that dynamically compresses retrieved passages to fit compact language model input constraints, thereby substantially reducing computational overhead. Our key contributions include: (i) the first end-to-end, offline-capable on-device RAG system; and (ii) significant efficiency gains—42% lower latency, 58% reduced memory footprint, and 37% less energy consumption—versus baseline approaches, without compromising generation accuracy. The framework ensures strong privacy preservation, real-time responsiveness, and resource efficiency, establishing a practical paradigm for edge intelligence.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) has proven effective on server infrastructures, but its application on mobile devices is still underexplored due to limited memory and power resources. Existing vector search and RAG solutions largely assume abundant computation resources, making them impractical for on-device scenarios. In this paper, we propose MobileRAG, a fully on-device pipeline that overcomes these limitations by combining a mobile-friendly vector search algorithm, extit{EcoVector}, with a lightweight extit{Selective Content Reduction} (SCR) method. By partitioning and partially loading index data, EcoVector drastically reduces both memory footprint and CPU usage, while the SCR method filters out irrelevant text to diminish Language Model (LM) input size without degrading accuracy. Extensive experiments demonstrated that MobileRAG significantly outperforms conventional vector search and RAG methods in terms of latency, memory usage, and power consumption, while maintaining accuracy and enabling offline operation to safeguard privacy in resource-constrained environments.

Problem

Research questions and friction points this paper is trying to address.

Enabling efficient on-device RAG for mobile resource constraints

Reducing memory and CPU usage in mobile vector search

Maintaining accuracy while minimizing LM input size

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mobile-friendly vector search algorithm EcoVector

Lightweight Selective Content Reduction method

Partitioned and partially loaded index data

🔎 Similar Papers

On Efficient Variants of Segment Anything Model: A Survey