KidsArtBench: Multi-Dimensional Children's Art Evaluation with Attribute-Aware MLLMs

📅 2025-12-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of multidimensional, interpretable, and educationally aligned methods for aesthetic assessment of children’s artwork, this paper introduces KidsArtBench—the first benchmark tailored for children aged 5–15. It comprises 1,000+ artworks, nine pedagogically grounded annotation dimensions, and expert qualitative feedback, supporting both ordinal scoring and formative assessment. Methodologically, we propose the first attribute-aware multi-LoRA architecture coupled with Regression-Aware Fine-Tuning (RAFT), which disentangles abstract aesthetics into independent, interpretable dimensions. Our approach integrates rubric-aligned supervision and expert-coordinated annotation. Evaluated on Qwen2.5-VL-7B, our method achieves a Spearman correlation of 0.653 (+0.185 over baselines), with particularly notable gains in perceptual dimensions and substantial reduction in performance gaps for higher-order aesthetic attributes. All data, code, and ethical documentation are publicly released.

Technology Category

Application Category

📝 Abstract
Multimodal Large Language Models (MLLMs) show remarkable progress across many visual-language tasks; however, their capacity to evaluate artistic expression remains limited. Aesthetic concepts are inherently abstract and open-ended, and multimodal artwork annotations are scarce. We introduce KidsArtBench, a new benchmark of over 1k children's artworks (ages 5-15) annotated by 12 expert educators across 9 rubric-aligned dimensions, together with expert comments for feedback. Unlike prior aesthetic datasets that provide single scalar scores on adult imagery, KidsArtBench targets children's artwork and pairs multi-dimensional annotations with comment supervision to enable both ordinal assessment and formative feedback. Building on this resource, we propose an attribute-specific multi-LoRA approach, where each attribute corresponds to a distinct evaluation dimension (e.g., Realism, Imagination) in the scoring rubric, with Regression-Aware Fine-Tuning (RAFT) to align predictions with ordinal scales. On Qwen2.5-VL-7B, our method increases correlation from 0.468 to 0.653, with the largest gains on perceptual dimensions and narrowed gaps on higher-order attributes. These results show that educator-aligned supervision and attribute-aware training yield pedagogically meaningful evaluations and establish a rigorous testbed for sustained progress in educational AI. We release data and code with ethics documentation.
Problem

Research questions and friction points this paper is trying to address.

Evaluating children's artwork across multiple educational dimensions
Addressing limited MLLM capabilities in abstract aesthetic assessment
Providing ordinal scoring and formative feedback for artistic expression
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attribute-specific multi-LoRA approach for evaluation dimensions
Regression-Aware Fine-Tuning to align with ordinal scales
Educator-aligned supervision for pedagogically meaningful assessments
🔎 Similar Papers
No similar papers found.
M
Mingrui Ye
King’s College London
Chanjin Zheng
Chanjin Zheng
East China Normal University
educational measurementpsychometricsapplied statistics
Z
Zengyi Yu
East China Normal University
C
Chenyu Xiang
University of Sheffield
Z
Zhixue Zhao
University of Sheffield
Z
Zheng Yuan
University of Sheffield
Helen Yannakoudakis
Helen Yannakoudakis
Senior Lecturer, King’s College London
Machine learningnatural language processing