DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Small language models (SLMs) suffer from limited factual knowledge, severe hallucination, and difficulty integrating retrieval-augmented generation (RAG). Method: We propose the first LLM-to-SLM RAG capability distillation framework, featuring a dual-path mechanism—evidence chain distillation and knowledge graph alignment—augmented by multi-stage response consistency constraints and a privacy-aware RAG architecture to ensure high-fidelity factual knowledge transfer. Contribution/Results: This work is the first to systematically distill RAG capabilities from large language models (LLMs) to SLMs; simultaneously mitigates hallucination and user privacy risks; and introduces a dedicated RAG evaluation benchmark for SLMs. Experiments demonstrate up to 27.7% higher factual accuracy over MiniRAG across multiple benchmarks, while significantly reducing model size and computational overhead—achieving both efficient inference and trustworthy generation.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) methods have proven highly effective for tasks requiring factual consistency and robust knowledge retrieval. However, large-scale RAG systems consume significant computational resources and are prone to generating hallucinated content from Humans. In this work, we introduce $ exttt{DRAG}$, a novel framework for distilling RAG knowledge from large-scale Language Models (LLMs) into small LMs (SLMs). Our approach leverages evidence- and knowledge graph-based distillation, ensuring that the distilled model retains critical factual knowledge while significantly reducing model size and computational cost. By aligning the smaller model's predictions with a structured knowledge graph and ranked evidence, $ exttt{DRAG}$ effectively mitigates hallucinations and improves factual accuracy. We further present a case demonstrating how our framework mitigates user privacy risks and introduce a corresponding benchmark. Experimental evaluations on multiple benchmarks demonstrate that our method outperforms the prior competitive RAG methods like MiniRAG for SLMs by up to 27.7% using the same models, preserving high-level efficiency and reliability. With $ exttt{DRAG}$, we provide a practical and resource-efficient roadmap to deploying enhanced retrieval and generation capabilities in small-sized LLMs.

Problem

Research questions and friction points this paper is trying to address.

Distilling RAG knowledge from LLMs to SLMs efficiently

Mitigating hallucination via evidence and graph-based distillation

Reducing computational cost while maintaining factual accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distills RAG knowledge from LLMs to SLMs

Uses evidence and knowledge graph-based distillation

Reduces model size and computational cost

🔎 Similar Papers

No similar papers found.

Authors to Follow