MuseRAG: Idea Originality Scoring At Scale

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the labor-intensive, low-reliability, and non-scalable nature of manual coding in large-scale assessment of creative idea originality, this paper introduces the first psychometrically validated LLM+RAG collaborative framework. The method integrates semantic clustering with frequency-based statistics to emulate human judgment logic, enabling zero-shot dynamic binning and quantitative originality scoring while ensuring reliability, validity, and alignment with human judgments. Evaluated across five datasets comprising 1,143 participants and 16,294 ideas, the framework achieves an adjusted mutual information (AMI) of 0.59 for clustering consistency and a correlation coefficient (r = 0.89) between automated scores and expert annotations—significantly outperforming conventional approaches. This work establishes the first high-fidelity, interpretable, and scalable automated originality assessment system, advancing creativity science and AI-augmented innovation with a novel paradigm.

Technology Category

Application Category

📝 Abstract

An objective, face-valid way to assess the originality of creative ideas is to measure how rare each idea is within a population -- an approach long used in creativity research but difficult to automate at scale. Tabulating response frequencies via manual bucketing of idea rephrasings is labor-intensive, error-prone, and brittle under large corpora. We introduce a fully automated, psychometrically validated pipeline for frequency-based originality scoring. Our method, MuseRAG, combines large language models (LLMs) with an externally orchestrated retrieval-augmented generation (RAG) framework. Given a new idea, the system retrieves semantically similar prior idea buckets and zero-shot prompts the LLM to judge whether the new idea belongs to an existing bucket or forms a new one. The resulting buckets enable computation of frequency-based originality metrics. Across five datasets (N=1143, n_ideas=16294), MuseRAG matches human annotators in idea clustering structure and resolution (AMI = 0.59) and in participant-level scoring (r = 0.89) -- while exhibiting strong convergent and external validity. Our work enables intent-sensitive, human-aligned originality scoring at scale to aid creativity research.

Problem

Research questions and friction points this paper is trying to address.

Automate rarity-based originality scoring for creative ideas

Replace manual bucketing with AI-driven semantic clustering

Validate scalable originality metrics against human judgments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated pipeline for originality scoring

Combines LLMs with RAG framework

Zero-shot prompts for idea bucketing

🔎 Similar Papers

A Novel Mathematical Framework for Objective Characterization of Ideas

2024-09-11Citations: 1

Interesting Scientific Idea Generation using Knowledge Graphs and LLMs: Evaluations with 100 Research Group Leaders

2024-05-27Citations: 3

Scideator: Human-LLM Scientific Idea Generation Grounded in Research-Paper Facet Recombination

2024-09-23arXiv.orgCitations: 2

Authors to Follow