🤖 AI Summary
This study investigates the capability of large language models (LLMs) to assess the cognitive complexity of reading comprehension items, focusing on two core dimensions: breadth of evidentiary scope and depth of information transformation.
Method: Departing from conventional NLP approaches, it pioneers the use of LLMs to model implicit cognitive features—traditionally difficult to extract explicitly—from human reasoning processes. Leveraging structured prompt engineering and qualitative analysis, the method enables fine-grained characterization of latent inferential load embedded in items.
Contribution/Results: Experiments demonstrate that LLMs achieve strong alignment with human-annotated cognitive complexity scores and show significant promise for pre-assessing item difficulty. However, LLMs exhibit marked limitations in metacognitive awareness—i.e., reliably identifying and articulating their own reasoning steps. This work establishes a novel paradigm for intelligent educational assessment, highlighting both the unique utility of LLMs in cognitive modeling and their current theoretical and practical boundaries.
📝 Abstract
Estimating the cognitive complexity of reading comprehension (RC) items is crucial for assessing item difficulty before it is administered to learners. Unlike syntactic and semantic features, such as passage length or semantic similarity between options, cognitive features that arise during answer reasoning are not readily extractable using existing NLP tools and have traditionally relied on human annotation. In this study, we examine whether large language models (LLMs) can estimate the cognitive complexity of RC items by focusing on two dimensions-Evidence Scope and Transformation Level-that indicate the degree of cognitive burden involved in reasoning about the answer. Our experimental results demonstrate that LLMs can approximate the cognitive complexity of items, indicating their potential as tools for prior difficulty analysis. Further analysis reveals a gap between LLMs' reasoning ability and their metacognitive awareness: even when they produce correct answers, they sometimes fail to correctly identify the features underlying their own reasoning process.