MuseRAG: Idea Originality Scoring At Scale

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the labor-intensive, low-reliability, and non-scalable nature of manual coding in large-scale assessment of creative idea originality, this paper introduces the first psychometrically validated LLM+RAG collaborative framework. The method integrates semantic clustering with frequency-based statistics to emulate human judgment logic, enabling zero-shot dynamic binning and quantitative originality scoring while ensuring reliability, validity, and alignment with human judgments. Evaluated across five datasets comprising 1,143 participants and 16,294 ideas, the framework achieves an adjusted mutual information (AMI) of 0.59 for clustering consistency and a correlation coefficient (r = 0.89) between automated scores and expert annotations—significantly outperforming conventional approaches. This work establishes the first high-fidelity, interpretable, and scalable automated originality assessment system, advancing creativity science and AI-augmented innovation with a novel paradigm.

Technology Category

Application Category

📝 Abstract
An objective, face-valid way to assess the originality of creative ideas is to measure how rare each idea is within a population -- an approach long used in creativity research but difficult to automate at scale. Tabulating response frequencies via manual bucketing of idea rephrasings is labor-intensive, error-prone, and brittle under large corpora. We introduce a fully automated, psychometrically validated pipeline for frequency-based originality scoring. Our method, MuseRAG, combines large language models (LLMs) with an externally orchestrated retrieval-augmented generation (RAG) framework. Given a new idea, the system retrieves semantically similar prior idea buckets and zero-shot prompts the LLM to judge whether the new idea belongs to an existing bucket or forms a new one. The resulting buckets enable computation of frequency-based originality metrics. Across five datasets (N=1143, n_ideas=16294), MuseRAG matches human annotators in idea clustering structure and resolution (AMI = 0.59) and in participant-level scoring (r = 0.89) -- while exhibiting strong convergent and external validity. Our work enables intent-sensitive, human-aligned originality scoring at scale to aid creativity research.
Problem

Research questions and friction points this paper is trying to address.

Automate rarity-based originality scoring for creative ideas
Replace manual bucketing with AI-driven semantic clustering
Validate scalable originality metrics against human judgments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated pipeline for originality scoring
Combines LLMs with RAG framework
Zero-shot prompts for idea bucketing
A
Ali Sarosh Bangash
Bellini College of Artificial Intelligence, Cybersecurity and Computing, University of South Florida, USA
K
Krish Veera
Bellini College of Artificial Intelligence, Cybersecurity and Computing, University of South Florida, USA
I
Ishfat Abrar Islam
Bellini College of Artificial Intelligence, Cybersecurity and Computing, University of South Florida, USA
Raiyan Abdul Baten
Raiyan Abdul Baten
University of South Florida
Computational Social ScienceNetwork ScienceAffective ComputingHuman-Computer Interaction