Refine-IQA: Multi-Stage Reinforcement Finetuning for Perceptual Image Quality Assessment

📅 2025-08-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing reinforcement fine-tuning (RFT)-based image quality assessment (IQA) methods suffer from two key limitations: (1) absence of explicit reward supervision for the model’s “reasoning process”, and (2) insufficient enhancement of low-level visual quality perception. To address these, we propose Refine-IQA, a novel multi-stage RFT framework. First, we construct Refine-Perception-20K—a large-scale dataset with pixel-level distortion annotations. Second, we design a dual reward mechanism integrating rule consistency and output accuracy, and introduce the first chain-of-thought (CoT)-oriented probabilistic difference reward function to jointly optimize perceptual modeling and score generation in distinct stages. To our knowledge, Refine-IQA is the first to apply multi-stage RFT to IQA, significantly improving both model interpretability and performance ceilings. Experiments demonstrate state-of-the-art results across perception, scoring, and quality interpretation benchmarks, confirming its superior fine-grained understanding of image quality.

Technology Category

Application Category

📝 Abstract

Reinforcement fine-tuning (RFT) is a proliferating paradigm for LMM training. Analogous to high-level reasoning tasks, RFT is similarly applicable to low-level vision domains, including image quality assessment (IQA). Existing RFT-based IQA methods typically use rule-based output rewards to verify the model's rollouts but provide no reward supervision for the "think" process, leaving its correctness and efficacy uncontrolled. Furthermore, these methods typically fine-tune directly on downstream IQA tasks without explicitly enhancing the model's native low-level visual quality perception, which may constrain its performance upper bound. In response to these gaps, we propose the multi-stage RFT IQA framework (Refine-IQA). In Stage-1, we build the Refine-Perception-20K dataset (with 12 main distortions, 20,907 locally-distorted images, and over 55K RFT samples) and design multi-task reward functions to strengthen the model's visual quality perception. In Stage-2, targeting the quality scoring task, we introduce a probability difference reward involved strategy for "think" process supervision. The resulting Refine-IQA Series Models achieve outstanding performance on both perception and scoring tasks-and, notably, our paradigm activates a robust "think" (quality interpreting) capability that also attains exceptional results on the corresponding quality interpreting benchmark.

Problem

Research questions and friction points this paper is trying to address.

Enhance visual quality perception in IQA models

Supervise 'think' process in reinforcement fine-tuning

Improve performance upper bound in IQA tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage reinforcement fine-tuning for IQA

Multi-task reward functions enhance perception

Probability difference reward supervises think process

🔎 Similar Papers

No similar papers found.

Authors to Follow