P\textsuperscript{2}-DPO: Grounding Hallucination in Perceptual Processing via Calibration Direct Preference Optimization

šŸ“… 2026-06-02
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF

career value

194K/year
šŸ¤– AI Summary
This work addresses the perceptual bottlenecks commonly observed in large vision-language models, which often manifest as attentional bias and insufficient robustness under image degradation. To mitigate these issues, the authors propose P²-DPO, a novel training paradigm built upon the Direct Preference Optimization (DPO) framework. P²-DPO introduces, for the first time, a perception-oriented online self-generated preference pair mechanism and incorporates a calibration loss to achieve causal alignment between vision and language modalities. Notably, this approach operates without human feedback and, at comparable training cost, substantially enhances model performance in terms of attention region fidelity and robustness in degraded visual conditions, thereby improving both perceptual accuracy and visual reliability.
šŸ“ Abstract
Hallucination has recently garnered significant research attention in Large Vision-Language Models (LVLMs). Direct Preference Optimization (DPO) aims to learn directly from the corrected preferences provided by humans, thereby addressing the hallucination issue. Despite its success, this paradigm has yet to specifically target the perceptual bottleneck in attended regions or address insufficient Visual Robustness against image degradation. Furthermore, existing preference pairs are often vision-agnostic and their inherently off-policy nature limits their effectiveness in guiding model learning. To address these challenges, we propose Perceptual Processing Direct Preference Optimization (P\textsuperscript{2}-DPO), a novel training paradigm in which the model generates and learns from its own preference pairs, thereby directly addressing the identified visual bottlenecks while inherently avoiding the issues of vision-agnostic and off-policy data. It introduces: (1) an on-policy preference pairs construction method targeting Focus-and-Enhance perception and Visual Robustness, and (2) a well-designed Calibration Loss to precisely align visual signals with the causal generation of text. Experimental results demonstrate that with a comparable amount of training data and cost, P\textsuperscript{2}-DPO outperforms strong baselines that rely on costly human feedback on benchmarks. Furthermore, evaluations on Attention Region Fidelity (ARF) and image degradation scenarios validate the effectiveness of P\textsuperscript{2}-DPO in addressing perceptual bottleneck in attended regions and improving Visual Robustness against degraded inputs.
Problem

Research questions and friction points this paper is trying to address.

Hallucination
Perceptual Bottleneck
Visual Robustness
Direct Preference Optimization
Vision-Language Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Perceptual Processing
Direct Preference Optimization
Visual Robustness
On-policy Preference Pairs
Calibration Loss
šŸ”Ž Similar Papers
R
Ruipeng Zhang
Guangdong Provincial Key Laboratory of Computational AI Models and Cognitive Intelligence, School of Computer Science & Engineering, South China University of Technology; Pazhou Lab, Guangzhou, China; Engineering Research Center of the Ministry of Education on Health Intelligent Perception and Paralleled Digital-Human, Guangzhou, China
Zhihao Li
Zhihao Li
The Hong Kong University of Science and Technology (Guangzhou)
AI for ScienceAI for PDEGraph Neural Networks
H
Haozhang Yuan
Guangdong Provincial Key Laboratory of Computational AI Models and Cognitive Intelligence, School of Computer Science & Engineering, South China University of Technology; Pazhou Lab, Guangzhou, China; Engineering Research Center of the Ministry of Education on Health Intelligent Perception and Paralleled Digital-Human, Guangzhou, China
C
C. L. Philip Chen
Guangdong Provincial Key Laboratory of Computational AI Models and Cognitive Intelligence, School of Computer Science & Engineering, South China University of Technology; Pazhou Lab, Guangzhou, China; Engineering Research Center of the Ministry of Education on Health Intelligent Perception and Paralleled Digital-Human, Guangzhou, China
Tong Zhang
Tong Zhang
South China Unversity of Technology, China
Computer ScienceArtificial IntelligenceAffective Computing