LoRA-Augmented Generation (LAG) for Knowledge-Intensive Language Tasks

📅 2025-07-07

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

Efficiently leveraging pre-trained language model experts for knowledge-intensive tasks remains challenging, particularly under data-free inference constraints. Method: This paper proposes LoRA-Augmented Generation (LAG), a parameter-efficient multi-expert collaborative inference framework that requires no additional training or data access. LAG constructs a large-scale, task-specific library of LoRA adapters and employs retrieval-augmented dynamic routing to fuse multiple LoRA experts layer-wise and token-wise during generation, enabling fine-grained knowledge scheduling. It seamlessly integrates with RAG-style setups when external data is available, overcoming performance bottlenecks of conventional data-free approaches. Contribution/Results: Extensive experiments demonstrate that LAG consistently outperforms state-of-the-art data-free methods in both zero-data and data-available settings, validating its strong generalization capability and practical utility across diverse knowledge-intensive tasks.

Technology Category

Application Category

📝 Abstract

The proliferation of fine-tuned language model experts for specific tasks and domains signals the need for efficient selection and combination methods. We propose LoRA-Augmented Generation (LAG) for leveraging large libraries of knowledge and task-specific LoRA adapters. LAG requires no additional training or access to data, and efficiently filters, retrieves, and applies experts on a per-token and layer basis. We evaluate LAG on various knowledge-intensive tasks, achieving superior performance over existing data-free methods. We explore scenarios where additional data is available, demonstrating LAG's compatibility with alternative solutions such as retrieval-augmented generation (RAG).

Problem

Research questions and friction points this paper is trying to address.

Efficient selection of task-specific language model experts

Combining knowledge-intensive LoRA adapters without training

Improving performance in data-free knowledge tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LoRA adapters for task-specific knowledge

Filters and retrieves experts per-token

Compatible with retrieval-augmented generation (RAG)

🔎 Similar Papers

MemoRAG: Boosting Long Context Processing with Global Memory-Enhanced Retrieval Augmentation