Characterizing Knowledge Graph Tasks in LLM Benchmarks Using Cognitive Complexity Frameworks

📅 2025-09-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-KG evaluation benchmarks overemphasize answer accuracy while neglecting systematic characterization of task-level cognitive complexity, leading to fragmented capability assessment, undetected blind spots, and insufficient task diversity. Method: This work introduces, for the first time, three established cognitive complexity frameworks from cognitive psychology into LLM-KG benchmark analysis, enabling multidimensional complexity modeling of knowledge graph (KG) tasks on LLM-KG-Bench. Contribution/Results: We uncover severe imbalances in cognitive demand distribution across current evaluations—particularly underrepresentation of higher-order reasoning and multi-step planning tasks. Our findings provide empirical grounding and a principled design paradigm for developing more interpretable, balanced, and challenging KG evaluation tasks. This advances KG evaluation from a purely outcome-oriented paradigm toward a dual-dimensional framework that jointly assesses both process-level reasoning and underlying cognitive capabilities.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are increasingly used for tasks involving Knowledge Graphs (KGs), whose evaluation typically focuses on accuracy and output correctness. We propose a complementary task characterization approach using three complexity frameworks from cognitive psychology. Applying this to the LLM-KG-Bench framework, we highlight value distributions, identify underrepresented demands and motivate richer interpretation and diversity for benchmark evaluation tasks.
Problem

Research questions and friction points this paper is trying to address.

Characterizing KG task complexity using cognitive psychology frameworks
Evaluating LLM performance beyond accuracy and correctness metrics
Identifying underrepresented cognitive demands in KG benchmark tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using cognitive complexity frameworks for task characterization
Applying frameworks to LLM-KG-Bench for analysis
Identifying underrepresented demands in benchmark evaluations
🔎 Similar Papers
No similar papers found.
S
Sara Todorovikj
Chemnitz University of Technology, Germany
Lars-Peter Meyer
Lars-Peter Meyer
InfAI e.V. Leipzig
M
Michael Martin
Chemnitz University of Technology, Germany; InfAI, Leipzig, Germany