CLPIPS: A Personalized Metric for AI-Generated Image Similarity

📅 2026-03-26

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Existing image similarity metrics such as LPIPS and CLIP often fail to align with human subjective judgments in text-to-image generation tasks, particularly in personalized or context-sensitive scenarios. This work proposes CLPIPS, which uniquely leverages user-provided ranking feedback on generated images to fine-tune the layer combination weights of LPIPS through a lightweight adaptation. By optimizing these weights using a margin-based ranking loss on human-annotated data, CLPIPS achieves personalized alignment with perceptual similarity. Consistency with human judgments is evaluated using Spearman’s rank correlation coefficient and intraclass correlation coefficient. Experimental results demonstrate that CLPIPS significantly outperforms the original LPIPS in capturing user preferences, thereby validating the efficacy of lightweight, personalized fine-tuning for perceptual similarity assessment.

Technology Category

Application Category

📝 Abstract

Iterative prompt refinement is central to reproducing target images with text to image generative models. Previous studies have incorporated image similarity metrics (ISMs) as additional feedback to human users. Existing ISMs such as LPIPS and CLIP provide objective measures of image likeness but often fail to align with human judgments, particularly in context specific or user driven tasks. In this paper, we introduce Customized Learned Perceptual Image Patch Similarity (CLPIPS), a customized extension of LPIPS that adapts a metric's notion of similarity directly to human judgments. We aim to explore whether lightweight, human augmented fine tuning can meaningfully improve perceptual alignment, positioning similarity metrics as adaptive components for human in the loop workflows with text to image tools. We evaluate CLPIPS on a human subject dataset in which participants iteratively regenerate target images and rank generated outputs by perceived similarity. Using margin ranking loss on human ranked image pairs, we fine tune only the LPIPS layer combination weights and assess alignment via Spearman rank correlation and Intraclass Correlation Coefficient. Our results show that CLPIPS achieves stronger correlation and agreement with human judgments than baseline LPIPS. Rather than optimizing absolute metric performance, our work emphasizes improving alignment consistency between metric predictions and human ranks, demonstrating that even limited human specific fine tuning can meaningfully enhance perceptual alignment in human in the loop text to image workflows.

Problem

Research questions and friction points this paper is trying to address.

image similarity metrics

human judgment alignment

text-to-image generation

perceptual similarity

human-in-the-loop

Innovation

Methods, ideas, or system contributions that make the work stand out.

CLPIPS

personalized similarity metric

human-in-the-loop