E-comIQ-ZH: A Human-Aligned Dataset and Benchmark for Fine-Grained Evaluation of E-commerce Posters with Chain-of-Thought

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing automatic evaluation methods struggle to perform fine-grained quality assessment of Chinese e-commerce posters, particularly due to deficiencies in evaluating functional design and complex Chinese text artifacts. To address this gap, this work introduces E-comIQ-18k, the first multidimensional dataset for Chinese e-commerce posters featuring human ratings and expert-calibrated chain-of-thought (CoT) annotations. Leveraging this dataset, we train a specialized evaluation model, E-comIQ-M, and establish E-comIQ-Bench, a scalable automated benchmark for poster quality assessment. Our approach pioneers the integration of expert-calibrated CoT annotations to construct a function-oriented, fine-grained evaluation framework. Experimental results demonstrate that E-comIQ-M significantly outperforms existing methods in alignment with human expert judgments, enabling efficient and scalable automatic evaluation of generated Chinese e-commerce posters.

Technology Category

Application Category

📝 Abstract
Generative AI is widely used to create commercial posters. However, rapid advances in generation have outpaced automated quality assessment. Existing models emphasize generic esthetics or low level distortions and lack the functional criteria required for e-commerce design. It is especially challenging for Chinese content, where complex characters often produce subtle but critical textual artifacts that are overlooked by existing methods. To address this, we introduce E-comIQ-ZH, a framework for evaluating Chinese e-commerce posters. We build the first dataset E-comIQ-18k to feature multi dimensional scores and expert calibrated Chain of Thought (CoT) rationales. Using this dataset, we train E-comIQ-M, a specialized evaluation model that aligns with human expert judgment. Our framework enables E-comIQ-Bench, the first automated and scalable benchmark for the generation of Chinese e-commerce posters. Extensive experiments show our E-comIQ-M aligns more closely with expert standards and enables scalable automated assessment of e-commerce posters. All datasets, models, and evaluation tools will be released to support future research in this area.Code will be available at https://github.com/4mm7/E-comIQ-ZH.
Problem

Research questions and friction points this paper is trying to address.

e-commerce posters
quality assessment
Chinese text artifacts
human-aligned evaluation
generative AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

E-commerce Poster Evaluation
Chain-of-Thought
Human-Aligned Assessment
Chinese Text Artifacts
Generative AI Benchmark
🔎 Similar Papers
No similar papers found.
Meiqi Sun
Meiqi Sun
School of Journalism and Communication, Tsinghua University
Media EffectsDigital DivideEnvironmental CommunicationMedia Psychology
M
Mingyu Li
Taobao & Tmall Group, Alibaba Group
J
Junxiong Zhu
Taobao & Tmall Group, Alibaba Group