Human-AI Interaction: Evaluating LLM Reasoning on Digital Logic Circuit included Graph Problems, in terms of creativity in design and analysis

📅 2026-02-16

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study evaluates the reasoning reliability of large language models (GPT, Gemini, and Claude) in undergraduate digital logic circuit design and analysis tasks, with a focus on complex sequential logic problems. Through human–AI interaction experiments combining student subjective ratings and expert verification against official solutions, the work assesses model performance across multiple dimensions—including correctness, consistency, clarity, and pedagogical suitability. The findings reveal a critical systemic risk: despite producing outputs that are structurally coherent and highly confident—often misperceived by students as reliable—none of the models generated fully correct answers on any of seven challenging sequential logic problems. This discrepancy between high perceived credibility and low factual accuracy stems from an overreliance on textbook templates rather than genuine reasoning about actual circuit behavior, raising significant concerns for the use of AI in digital logic education.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly used by undergraduate students as on-demand tutors, yet their reliability on circuit- and diagram-based digital logic problems remains unclear. We present a human- AI study evaluating three widely used LLMs (GPT, Gemini, and Claude) on 10 undergraduate-level digital logic questions spanning non-standard counters, JK-based state transitions, timing diagrams, frequency division, and finite-state machines. Twenty-four students performed pairwise model comparisons, providing per-question judgments on (i) preferred model, (ii) perceived correctness, (iii) consistency, (iv) verbosity, and (v) confidence, along with global ratings of overall model quality, satisfaction across multiple dimensions (e.g., accuracy and clarity), and perceived mental effort required to verify answers. To benchmark technical validity, we applied an independent judge-based evaluation against official solutions for all ten questions, using strict correctness criteria. Results reveal a consistent gap between perceived helpfulness and formal correctness: for the most sequentially demanding problems (Q1- Q7), none of the evaluated LLMs matched the official answers, despite producing confident, well-structured explanations that students often rated favorably. Error analysis indicates that models frequently default to canonical textbook templates (e.g., standard ripple counters) and struggle to translate circuit structure into exact state evolution and timing behavior. These findings suggest that, without verification scaffolds, LLMs may be unreliable for core digital logic topics and can inadvertently reinforce misconceptions in undergraduate instruction.

Problem

Research questions and friction points this paper is trying to address.

Human-AI Interaction

Large Language Models

Digital Logic Circuits

Graph Problems

Undergraduate Education

Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-AI Interaction

Large Language Models

Digital Logic Circuits

Reasoning Evaluation

Educational Reliability

🔎 Similar Papers

No similar papers found.

Authors to Follow