Boosting RL-Based Visual Reasoning with Selective Adversarial Entropy Intervention

📅 2025-12-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing visual language models (VLMs) employ entropy regularization only during policy update in reinforcement learning (RL) fine-tuning, neglecting its potential to modulate response diversity during the RL sampling phase. To address this, we propose Selective Adversarial Entropy Intervention (SaEI), a novel framework that actively increases policy entropy during RL sampling via vision-input perturbation. Specifically, SaEI introduces Entropy-guided Adversarial Sampling (EgAS), which formulates entropy as an optimizable adversarial objective, and Token-selective Entropy Computation (TsEC), enabling semantic-preserving, gradient-driven visual perturbations. This work is the first to synergistically integrate entropy regularization and adversarial attacks within the RL inference stage of VLMs to enhance sampling diversity. Experiments demonstrate that SaEI significantly improves policy exploration, answer diversity, and reasoning accuracy on both in-domain and cross-domain visual reasoning benchmarks, while preserving factual knowledge integrity.

Technology Category

Application Category

📝 Abstract
Recently, reinforcement learning (RL) has become a common choice in enhancing the reasoning capabilities of vision-language models (VLMs). Considering existing RL- based finetuning methods, entropy intervention turns out to be an effective way to benefit exploratory ability, thereby improving policy performance. Notably, most existing stud- ies intervene in entropy by simply controlling the update of specific tokens during policy optimization of RL. They ig- nore the entropy intervention during the RL sampling that can boost the performance of GRPO by improving the di- versity of responses. In this paper, we propose Selective- adversarial Entropy Intervention, namely SaEI, which en- hances policy entropy by distorting the visual input with the token-selective adversarial objective coming from the en- tropy of sampled responses. Specifically, we first propose entropy-guided adversarial sampling (EgAS) that formu- lates the entropy of sampled responses as an adversarial ob- jective. Then, the corresponding adversarial gradient can be used to attack the visual input for producing adversarial samples, allowing the policy model to explore a larger an- swer space during RL sampling. Then, we propose token- selective entropy computation (TsEC) to maximize the ef- fectiveness of adversarial attack in EgAS without distorting factual knowledge within VLMs. Extensive experiments on both in-domain and out-of-domain datasets show that our proposed method can greatly improve policy exploration via entropy intervention, to boost reasoning capabilities. Code will be released once the paper is accepted.
Problem

Research questions and friction points this paper is trying to address.

Enhancing RL-based visual reasoning via selective adversarial entropy intervention
Improving policy exploration by distorting visual inputs with adversarial objectives
Boosting reasoning capabilities without distorting factual knowledge in VLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective adversarial entropy intervention distorts visual inputs
Entropy-guided adversarial sampling expands answer space diversity
Token-selective computation preserves factual knowledge during attacks
🔎 Similar Papers
No similar papers found.
Y
Yang Yu
The Hong Kong University of Science and Technology
Z
Zhuangzhuang Chen
The Hong Kong University of Science and Technology
S
Siqi Wang
The Hong Kong University of Science and Technology
Lanqing Li
Lanqing Li
Zhejiang Lab, The Chinese University of Hong Kong
Machine LearningAI for ScienceReinforcement LearningAI for Drug Discovery
Xiaomeng Li
Xiaomeng Li
Assistant Professor, The Hong Kong University of Science and Technology
Medical Image AnalysisAI in HealthcareDeep Learning