GenAI-LA: Generative AI and Learning Analytics Workshop (LAK 2026), April 27--May 1, 2026, Bergen, Norway

📅 2026-02-17

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

This work addresses the scarcity of high-quality evaluation data for instructional explanations in educational AI systems by introducing EduEVAL-DB, a novel dataset comprising 854 teaching explanations generated through both real educators and six simulated teacher personas crafted via prompt engineering. The dataset incorporates a five-dimensional pedagogical risk annotation framework grounded in established educational standards. To ensure annotation fidelity, the project innovatively integrates authentic teaching roles with risk dimensions and employs a semi-automated expert review mechanism. Evaluated on consumer-grade hardware, EduEVAL-DB demonstrates efficacy in detecting pedagogical risks and enables supervised fine-tuning and educational capability assessment of models such as Gemini 2.5 Pro and Llama 3.1 8B, while remaining suitable for lightweight model training.

Technology Category

Application Category

📝 Abstract

This work introduces EduEVAL-DB, a dataset based on teacher roles designed to support the evaluation and training of automatic pedagogical evaluators and AI tutors for instructional explanations. The dataset comprises 854 explanations corresponding to 139 questions from a curated subset of the ScienceQA benchmark, spanning science, language, and social science across K-12 grade levels. For each question, one human-teacher explanation is provided and six are generated by LLM-simulated teacher roles. These roles are inspired by instructional styles and shortcomings observed in real educational practice and are instantiated via prompt engineering. We further propose a pedagogical risk rubric aligned with established educational standards, operationalizing five complementary risk dimensions: factual correctness, explanatory depth and completeness, focus and relevance, student-level appropriateness, and ideological bias. All explanations are annotated with binary risk labels through a semi-automatic process with expert teacher review. Finally, we present preliminary validation experiments to assess the suitability of EduEVAL-DB for evaluation. We benchmark a state-of-the-art education-oriented model (Gemini 2.5 Pro) against a lightweight local Llama 3.1 8B model and examine whether supervised fine-tuning on EduEVAL-DB supports pedagogical risk detection using models deployable on consumer hardware.

Problem

Research questions and friction points this paper is trying to address.

pedagogical evaluation

instructional explanations

AI tutors

educational risk

learning analytics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative AI in Education

Pedagogical Risk Assessment

Teacher Role Simulation