A Comparative Study of Technical Writing Feedback Quality: Evaluating LLMs, SLMs, and Humans in Computer Science Topics

📅 2025-12-01

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

This study addresses the scalability challenges of providing technical writing feedback in computer science education by proposing a novel AI-assisted pedagogical paradigm that ensures privacy preservation and incurs zero marginal cost. Employing a mixed-methods approach, the research compares the quality of feedback generated by a locally deployed, quantized small language model (SLM) based on Llama-3.1, commercial large language models (e.g., GPT-4), and human instructors across programming, operating systems, and writing seminar courses. Empirical results indicate that the SLM is preferred by students for its readability and actionable suggestions, delivering feedback quality comparable to or exceeding that of commercial LLMs, while human instructors retain an advantage in highly specialized tasks. The findings validate the efficacy of a tiered collaboration model wherein AI provides structured guidance and instructors focus on higher-order conceptual instruction.

📝 Abstract

Feedback is a critical component of the learning process, particularly in computer science education. This study investigates the quality of feedback generated by Large Language Models (LLMs), Small Language Models (SLMs), compared with human feedback, in three computer science course with technical writing components: an introductory computer science course (CS2), a third-year advanced systems course (operating systems), and a third-year writing course (a topics course on artificial intelligence). Using a mixed-methods approach which integrates quantitative Likert-scale questions with qualitative commentary, we analyze the student perspective on feedback quality, evaluated based on multiple criteria, including readability, detail, specificity, actionability, helpfulness, and overall quality. The analysis reveals that in the larger upper-year operating systems course ($N=80$), SLMs and LLMs are perceived to deliver clear, actionable, and well-structured feedback, while humans provide more contextually nuanced guidance. As for the high-enrollment CS2 course ($N=176$) showed the same preference for the AI tools'clarity and breadth, but students noted that AI feedback sometimes lacked the concise, straight-to-the-point, guidance offered by humans. Conversely, in the smaller upper-year technical writing course on AI topics ($N=7$), all students preferred feedback from the course instructor, who was able to provide clear, specific, and personalized feedback, compared to the more general and less targeted AI-based feedback. We also highlight the scalability of AI-based feedback by focusing on its effectiveness at large scale. Our findings underscore the potential of hybrid approaches that combine AI and human feedback to achieve efficient and high-quality feedback at scale.

Problem

Research questions and friction points this paper is trying to address.

technical writing feedback

Large Language Models

Small Language Models

privacy

scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Small Language Model

Local Deployment

Quantized LLM