DEFINED: A Data-Efficient Computational Framework for Fine-Grained Creativity Assessment in Debate Scenarios

📅 2026-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing approaches struggle to efficiently and fine-grainedly evaluate human creativity in complex open-ended scenarios such as debate, due to reliance on simplistic tasks, scarcity of expert-annotated data, and suboptimal performance of automated scoring methods. This work proposes a data-efficient computational framework that introduces, for the first time, an eight-dimensional hierarchical creativity assessment system tailored to debate. By integrating a pretrained autoregressive language model with hierarchical scoring heads and employing constrained data augmentation alongside a mixed-granularity training strategy, the model robustly learns from limited expert annotations. Experiments demonstrate that the proposed approach yields accurate and stable scores, significantly outperforming both prompt-based large language model evaluators and existing debate-scoring methods, while also confirming its applicability in real-world ecological settings and among participants with low-to-moderate skill levels.
📝 Abstract
Human creativity has emerged as a critical competency in the era of large language models. Assessing creativity in complex, open-ended environments is a grand challenge in data mining, currently hindered by a reliance on standardized simple tasks and the scarcity of fine-grained expert data. As an ecologically valid assessment context, debate reflects multiple dimensions of creativity, encompassing both divergent thinking and convergent thinking. Moreover, debate is a data-rich domain, with a large volume of publicly accessible materials. Current mainstream automated scoring methods are poorly suited to complex settings such as debate, and therefore still rely on costly human evaluation. To this end, this paper proposes DEFINED, a data-efficient computational framework for fine-grained creativity assessment in debate scenarios. DEFINED operationalizes debate creativity through a hierarchical eight-dimensional metric system, implemented via a pre-trained autoregressive language model with a hierarchical scoring head that supports both fine-grained and coarse-grained evaluation. Statements and their associated expert scores were obtained from authentic debate competitions, and a constrained data augmentation strategy was employed to address the elite bias inherent in the original data. DEFINED adopts a mixed-granularity training strategy enabling robust learning from limited fine-grained supervision annotated by trained graduate experts. To rigorously validate ecological validity beyond synthetic benchmarks, we incorporate an empirical study with debate-naive participants, utilizing these authentic data to serve as a qualitative case study for mid-to-low proficiency populations. Across our evaluation protocol, our scoring model achieves accurate and stable scoring, outperforming prompt-based large language model evaluators and existing debate scoring methods.
Problem

Research questions and friction points this paper is trying to address.

creativity assessment
debate scenarios
fine-grained evaluation
data efficiency
ecological validity
Innovation

Methods, ideas, or system contributions that make the work stand out.

fine-grained creativity assessment
data-efficient learning
hierarchical scoring
debate evaluation
constrained data augmentation