Assessing AI-Generated Questions' Alignment with Cognitive Frameworks in Educational Assessment

📅 2025-04-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the misalignment between AI-generated multiple-choice questions (MCQs) and Bloom’s taxonomy cognitive levels. Method: We propose the first framework to deeply integrate Bloom’s six-level taxonomy throughout the AI-based item generation pipeline, implemented in the Moodle plugin OneClickQuiz. We develop a DistilBERT-based cognitive-level classifier, augmented with linguistic complexity features—Flesch-Kincaid Grade Level (FKGL) and lexical density—and baseline models including Multinomial Logistic Regression. To mitigate fine-grained classification bias, we introduce a novel multi-level merging strategy for higher-order cognitive categories (Analysis, Evaluation, Creation). Contribution/Results: DistilBERT achieves 91% overall classification accuracy—significantly outperforming baselines. Question stem length, FKGL, and lexical density increase significantly across ascending Bloom levels. The merging strategy improves classification accuracy for higher-order categories by 12.3%. This work establishes a new paradigm for interpretable, objective-aligned automated assessment generation in educational AI.

Technology Category

Application Category

📝 Abstract
This study evaluates the integration of Bloom's Taxonomy into OneClickQuiz, an Artificial Intelligence (AI) driven plugin for automating Multiple-Choice Question (MCQ) generation in Moodle. Bloom's Taxonomy provides a structured framework for categorizing educational objectives into hierarchical cognitive levels. Our research investigates whether incorporating this taxonomy can improve the alignment of AI-generated questions with specific cognitive objectives. We developed a dataset of 3691 questions categorized according to Bloom's levels and employed various classification models-Multinomial Logistic Regression, Naive Bayes, Linear Support Vector Classification (SVC), and a Transformer-based model (DistilBERT)-to evaluate their effectiveness in categorizing questions. Our results indicate that higher Bloom's levels generally correlate with increased question length, Flesch-Kincaid Grade Level (FKGL), and Lexical Density (LD), reflecting the increased complexity of higher cognitive demands. Multinomial Logistic Regression showed varying accuracy across Bloom's levels, performing best for"Knowledge"and less accurately for higher-order levels. Merging higher-level categories improved accuracy for complex cognitive tasks. Naive Bayes and Linear SVC also demonstrated effective classification for lower levels but struggled with higher-order tasks. DistilBERT achieved the highest performance, significantly improving classification of both lower and higher-order cognitive levels, achieving an overall validation accuracy of 91%. This study highlights the potential of integrating Bloom's Taxonomy into AI-driven assessment tools and underscores the advantages of advanced models like DistilBERT for enhancing educational content generation.
Problem

Research questions and friction points this paper is trying to address.

Evaluates AI-generated questions' alignment with Bloom's Taxonomy
Assesses classification models' accuracy in cognitive level categorization
Explores DistilBERT's effectiveness in improving question classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-driven MCQ generation using Bloom's Taxonomy
Classification models including DistilBERT for accuracy
Improved alignment of questions with cognitive objectives
🔎 Similar Papers
No similar papers found.
Antoun Yaacoub
Antoun Yaacoub
Associate Professor in AI
Artificial IntelligenceNeural networksMachine learningDeep learning
J
Jérôme Da-Rugna
Learning, Data and Robotics (LDR) ESIEA Lab, ESIEA, Paris, France
Z
Zainab Assaghir
Faculty of Science, Lebanese University, Beirut, Lebanon