🤖 AI Summary
This study addresses the misalignment between AI-generated multiple-choice questions (MCQs) and Bloom’s taxonomy cognitive levels. Method: We propose the first framework to deeply integrate Bloom’s six-level taxonomy throughout the AI-based item generation pipeline, implemented in the Moodle plugin OneClickQuiz. We develop a DistilBERT-based cognitive-level classifier, augmented with linguistic complexity features—Flesch-Kincaid Grade Level (FKGL) and lexical density—and baseline models including Multinomial Logistic Regression. To mitigate fine-grained classification bias, we introduce a novel multi-level merging strategy for higher-order cognitive categories (Analysis, Evaluation, Creation). Contribution/Results: DistilBERT achieves 91% overall classification accuracy—significantly outperforming baselines. Question stem length, FKGL, and lexical density increase significantly across ascending Bloom levels. The merging strategy improves classification accuracy for higher-order categories by 12.3%. This work establishes a new paradigm for interpretable, objective-aligned automated assessment generation in educational AI.
📝 Abstract
This study evaluates the integration of Bloom's Taxonomy into OneClickQuiz, an Artificial Intelligence (AI) driven plugin for automating Multiple-Choice Question (MCQ) generation in Moodle. Bloom's Taxonomy provides a structured framework for categorizing educational objectives into hierarchical cognitive levels. Our research investigates whether incorporating this taxonomy can improve the alignment of AI-generated questions with specific cognitive objectives. We developed a dataset of 3691 questions categorized according to Bloom's levels and employed various classification models-Multinomial Logistic Regression, Naive Bayes, Linear Support Vector Classification (SVC), and a Transformer-based model (DistilBERT)-to evaluate their effectiveness in categorizing questions. Our results indicate that higher Bloom's levels generally correlate with increased question length, Flesch-Kincaid Grade Level (FKGL), and Lexical Density (LD), reflecting the increased complexity of higher cognitive demands. Multinomial Logistic Regression showed varying accuracy across Bloom's levels, performing best for"Knowledge"and less accurately for higher-order levels. Merging higher-level categories improved accuracy for complex cognitive tasks. Naive Bayes and Linear SVC also demonstrated effective classification for lower levels but struggled with higher-order tasks. DistilBERT achieved the highest performance, significantly improving classification of both lower and higher-order cognitive levels, achieving an overall validation accuracy of 91%. This study highlights the potential of integrating Bloom's Taxonomy into AI-driven assessment tools and underscores the advantages of advanced models like DistilBERT for enhancing educational content generation.