Hydra: An Agentic Reasoning Approach for Enhancing Adversarial Robustness and Mitigating Hallucinations in Vision-Language Models

📅 2025-04-19

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the dual challenge of poor adversarial robustness and frequent hallucination in vision-language models (VLMs) within high-stakes applications. We propose the first fine-tuning-free, defense-agnostic unified optimization framework for VLMs. Methodologically, we design a plug-and-play iterative agent-based reasoning architecture that integrates action-critique loops, multi-model cross-verification, visual information retrieval, and dynamic output correction—augmented by structured critique generation and cross-model factual verification. Our approach jointly optimizes for both adversarial robustness and hallucination suppression. Extensive evaluation across four mainstream VLMs demonstrates consistent and significant improvements over baselines and state-of-the-art hallucination-mitigation methods on three hallucination benchmarks and under two types of adversarial attacks. Both factual consistency and robustness are simultaneously enhanced, without requiring additional training, architectural modifications, or task-specific adaptation.

Technology Category

Application Category

📝 Abstract

To develop trustworthy Vision-Language Models (VLMs), it is essential to address adversarial robustness and hallucination mitigation, both of which impact factual accuracy in high-stakes applications such as defense and healthcare. Existing methods primarily focus on either adversarial defense or hallucination post-hoc correction, leaving a gap in unified robustness strategies. We introduce extbf{Hydra}, an adaptive agentic framework that enhances plug-in VLMs through iterative reasoning, structured critiques, and cross-model verification, improving both resilience to adversarial perturbations and intrinsic model errors. Hydra employs an Action-Critique Loop, where it retrieves and critiques visual information, leveraging Chain-of-Thought (CoT) and In-Context Learning (ICL) techniques to refine outputs dynamically. Unlike static post-hoc correction methods, Hydra adapts to both adversarial manipulations and intrinsic model errors, making it robust to malicious perturbations and hallucination-related inaccuracies. We evaluate Hydra on four VLMs, three hallucination benchmarks, two adversarial attack strategies, and two adversarial defense methods, assessing performance on both clean and adversarial inputs. Results show that Hydra surpasses plug-in VLMs and state-of-the-art (SOTA) dehallucination methods, even without explicit adversarial defenses, demonstrating enhanced robustness and factual consistency. By bridging adversarial resistance and hallucination mitigation, Hydra provides a scalable, training-free solution for improving the reliability of VLMs in real-world applications.

Problem

Research questions and friction points this paper is trying to address.

Enhancing adversarial robustness in Vision-Language Models

Mitigating hallucinations to improve factual accuracy

Unifying adversarial defense and hallucination correction strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive agentic framework for unified robustness

Action-Critique Loop with CoT and ICL

Training-free solution enhancing VLM reliability

🔎 Similar Papers

No similar papers found.

Authors to Follow