LoRA-Augmented Generation (LAG) for Knowledge-Intensive Language Tasks

📅 2025-07-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Efficiently leveraging pre-trained language model experts for knowledge-intensive tasks remains challenging, particularly under data-free inference constraints. Method: This paper proposes LoRA-Augmented Generation (LAG), a parameter-efficient multi-expert collaborative inference framework that requires no additional training or data access. LAG constructs a large-scale, task-specific library of LoRA adapters and employs retrieval-augmented dynamic routing to fuse multiple LoRA experts layer-wise and token-wise during generation, enabling fine-grained knowledge scheduling. It seamlessly integrates with RAG-style setups when external data is available, overcoming performance bottlenecks of conventional data-free approaches. Contribution/Results: Extensive experiments demonstrate that LAG consistently outperforms state-of-the-art data-free methods in both zero-data and data-available settings, validating its strong generalization capability and practical utility across diverse knowledge-intensive tasks.

Technology Category

Application Category

📝 Abstract
The proliferation of fine-tuned language model experts for specific tasks and domains signals the need for efficient selection and combination methods. We propose LoRA-Augmented Generation (LAG) for leveraging large libraries of knowledge and task-specific LoRA adapters. LAG requires no additional training or access to data, and efficiently filters, retrieves, and applies experts on a per-token and layer basis. We evaluate LAG on various knowledge-intensive tasks, achieving superior performance over existing data-free methods. We explore scenarios where additional data is available, demonstrating LAG's compatibility with alternative solutions such as retrieval-augmented generation (RAG).
Problem

Research questions and friction points this paper is trying to address.

Efficient selection of task-specific language model experts
Combining knowledge-intensive LoRA adapters without training
Improving performance in data-free knowledge tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LoRA adapters for task-specific knowledge
Filters and retrieves experts per-token
Compatible with retrieval-augmented generation (RAG)
🔎 Similar Papers
No similar papers found.