🤖 AI Summary
This work addresses the challenge that existing image aesthetic assessment methods struggle to capture subtle aesthetic differences and thus fail to support fine-grained ranking. To this end, the authors formally define the fine-grained image aesthetic assessment task for the first time, introduce the FGAesthetics dataset comprising 32,217 images, and propose the FGAesQ framework, which learns discriminative aesthetic scores through relative ranking. FGAesQ integrates three core components: DiffToken for preserving fine-grained distinctions, a text-assisted CTAlign alignment mechanism, and a ranking-aware RankReg regression strategy, further enhanced by data refinement and label calibration to improve annotation quality. Experiments demonstrate that FGAesQ significantly outperforms state-of-the-art methods on fine-grained tasks while maintaining competitive performance in conventional coarse-grained aesthetic assessment, confirming its effectiveness and generalization capability.
📝 Abstract
Image aesthetic assessment (IAA) has extensive applications in content creation, album management, and recommendation systems, etc. In such applications, it is commonly needed to pick out the most aesthetically pleasing image from a series of images with subtle aesthetic variations, a topic we refer to as fine-grained IAA. Unfortunately, state-of-the-art IAA models are typically designed for coarse-grained evaluation, where images with notable aesthetic differences are evaluated independently on an absolute scale. These models are inherently limited in discriminating fine-grained aesthetic differences. To address the dilemma, we contribute FGAesthetics, a fine-grained IAA database with 32,217 images organized into 10,028 series, which are sourced from diverse categories including Natural, AIGC, and Cropping. Annotations are collected via pairwise comparisons within each series. We also devise Series Refinement and Rank Calibration to ensure the reliability of data and labels. Based on FGAesthetics, we further propose FGAesQ, a novel IAA framework that learns discriminative aesthetic scores from relative ranks through Difference-preserved Tokenization (DiffToken), Comparative Text-assisted Alignment (CTAlign), and Rank-aware Regression (RankReg). FGAesQ enables accurate aesthetic assessment in fine-grained scenarios while still maintains competitive performance in coarse-grained evaluation. Extensive experiments and comparisons demonstrate the superiority of the proposed method.