E-comIQ-ZH: A Human-Aligned Dataset and Benchmark for Fine-Grained Evaluation of E-commerce Posters with Chain-of-Thought

📅 2026-02-25

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Existing automatic evaluation methods struggle to perform fine-grained quality assessment of Chinese e-commerce posters, particularly due to deficiencies in evaluating functional design and complex Chinese text artifacts. To address this gap, this work introduces E-comIQ-18k, the first multidimensional dataset for Chinese e-commerce posters featuring human ratings and expert-calibrated chain-of-thought (CoT) annotations. Leveraging this dataset, we train a specialized evaluation model, E-comIQ-M, and establish E-comIQ-Bench, a scalable automated benchmark for poster quality assessment. Our approach pioneers the integration of expert-calibrated CoT annotations to construct a function-oriented, fine-grained evaluation framework. Experimental results demonstrate that E-comIQ-M significantly outperforms existing methods in alignment with human expert judgments, enabling efficient and scalable automatic evaluation of generated Chinese e-commerce posters.

Technology Category

Application Category

📝 Abstract

Generative AI is widely used to create commercial posters. However, rapid advances in generation have outpaced automated quality assessment. Existing models emphasize generic esthetics or low level distortions and lack the functional criteria required for e-commerce design. It is especially challenging for Chinese content, where complex characters often produce subtle but critical textual artifacts that are overlooked by existing methods. To address this, we introduce E-comIQ-ZH, a framework for evaluating Chinese e-commerce posters. We build the first dataset E-comIQ-18k to feature multi dimensional scores and expert calibrated Chain of Thought (CoT) rationales. Using this dataset, we train E-comIQ-M, a specialized evaluation model that aligns with human expert judgment. Our framework enables E-comIQ-Bench, the first automated and scalable benchmark for the generation of Chinese e-commerce posters. Extensive experiments show our E-comIQ-M aligns more closely with expert standards and enables scalable automated assessment of e-commerce posters. All datasets, models, and evaluation tools will be released to support future research in this area.Code will be available at https://github.com/4mm7/E-comIQ-ZH.

Problem

Research questions and friction points this paper is trying to address.

e-commerce posters

quality assessment

Chinese text artifacts

human-aligned evaluation

generative AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

E-commerce Poster Evaluation

Chain-of-Thought

Human-Aligned Assessment