MRAG-Suite: A Diagnostic Evaluation Platform for Visual Retrieval-Augmented Generation

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing evaluation benchmarks for vision-oriented Retrieval-Augmented Generation (RAG) lack systematic modeling of query difficulty and ambiguity, hindering fine-grained diagnosis of model failure modes on complex queries. Method: We introduce the first difficulty- and ambiguity-aware diagnostic evaluation platform for vision RAG, featuring a multi-granularity framework for quantifying query complexity and a claim-level hallucination detection tool, MM-RAGChecker. The platform unifies diverse benchmarks—including WebQA, Chart-RAG, Visual-RAG, and MRAG-Bench—and employs controllable filtering to isolate high-difficulty and high-ambiguity samples. Contribution/Results: Experiments reveal that state-of-the-art models suffer substantial accuracy degradation (average drop of 28.6%) on challenging queries. MM-RAGChecker enables precise, fine-grained attribution of hallucinated claims to their root causes—e.g., retrieval errors, multimodal misalignment, or reasoning flaws—establishing an interpretable diagnostic paradigm for robustness analysis and targeted improvement of vision RAG systems.

Technology Category

Application Category

📝 Abstract
Multimodal Retrieval-Augmented Generation (Visual RAG) significantly advances question answering by integrating visual and textual evidence. Yet, current evaluations fail to systematically account for query difficulty and ambiguity. We propose MRAG-Suite, a diagnostic evaluation platform integrating diverse multimodal benchmarks (WebQA, Chart-RAG, Visual-RAG, MRAG-Bench). We introduce difficulty-based and ambiguity-aware filtering strategies, alongside MM-RAGChecker, a claim-level diagnostic tool. Our results demonstrate substantial accuracy reductions under difficult and ambiguous queries, highlighting prevalent hallucinations. MM-RAGChecker effectively diagnoses these issues, guiding future improvements in Visual RAG systems.
Problem

Research questions and friction points this paper is trying to address.

Systematically evaluates query difficulty and ambiguity
Diagnoses hallucination issues in Visual RAG systems
Proposes filtering strategies and tools for multimodal QA
Innovation

Methods, ideas, or system contributions that make the work stand out.

Platform integrates multimodal benchmarks for evaluation
Introduces difficulty-based and ambiguity-aware filtering strategies
Provides claim-level diagnostic tool for hallucination detection
🔎 Similar Papers
No similar papers found.