GENIE: A Fine-Grained Measure for Novelty

📅 2026-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited creativity and diversity often exhibited by large language models in generation tasks, as well as the inadequacy of conventional evaluation metrics in capturing fine-grained novelty. To this end, the authors propose GENIE—a task-aware, multidimensional framework for assessing novelty in generated text. GENIE quantifies novelty across multiple granularities by modeling task-specific dimensions of novelty and incorporating comparative analysis of population-level responses. It is the first framework to enable task-sensitive, multidimensional novelty evaluation, effectively uncovering the mechanisms through which existing enhancement methods operate on specific novelty dimensions and revealing distinctions in novelty that traditional holistic metrics fail to detect.
📝 Abstract
Large Language Models have consistently demonstrated a lack of creativity and diversity across tasks. Prior work has focused on addressing whether models are capable of generating creative outputs. Here, we aim to consider novelty and investigate what makes model-generated content novel or not novel in a task-specific manner. We propose a fine-grained evaluation metric GENIE to measure the novelty of responses along task-specific features with respect to a population of responses. We show that unlike GENIE, holistic metrics struggle to capture the high-dimensionality of novelty and do not provide insight on which properties they target. Finally, we use GENIE to measure the effectiveness of mitigation methods that address creativity to better understand where these methods can improve novelty.
Problem

Research questions and friction points this paper is trying to address.

novelty
large language models
creativity
diversity
evaluation metric
Innovation

Methods, ideas, or system contributions that make the work stand out.

novelty
fine-grained evaluation
large language models
creativity
GENIE
🔎 Similar Papers
No similar papers found.