ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle

📅 2025-07-16

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This study investigates whether large language models (LLMs) can generate “student-like” code—exhibiting authentic characteristics such as common errors, iterative refinement, stylistic diversity, and temporal evolution. To this end, we propose the first systematic framework for modeling student programming evolution, grounded in multi-semester, timestamped student submission data. Our approach integrates context-aware prompting, temporal modeling, and LLM fine-tuning. We further design a dual-granularity evaluation framework: fine-grained (semantic, functional, and stylistic fidelity) and coarse-grained (learning trajectory and error distribution). Crucially, our method explicitly models the “struggling learning” process—not merely optimal solutions. Experiments demonstrate significant improvements in the authenticity of generated code regarding error patterns, revision pacing, and stylistic variation, validating that explicit modeling of learning dynamics is essential for producing credible student code.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have shown strong performance on programming tasks, but can they generate student-like code like real students - imperfect, iterative, and stylistically diverse? We present ParaStudent, a systematic study of LLM-based "student-like" code generation in an introductory programming course setting. Using a dataset of timestamped student submissions across multiple semesters, we design low- and high-resolution experiments to model student progress and evaluate code outputs along semantic, functional, and stylistic dimensions. Our results show that fine-tuning significantly improves alignment with real student trajectories and captures error patterns, incremental improvements, and stylistic variations more faithfully. This study shows that modeling realistic student code requires capturing learning dynamics through context-aware generation, temporal modeling, and multi-dimensional evaluation. Code for experiments and evaluation is available at href{https://github.com/mmiroyan/ParaStudent}{ exttt{github.com/mmiroyan/ParaStudent}}.

Problem

Research questions and friction points this paper is trying to address.

Generating imperfect student-like code using LLMs

Evaluating code on semantic, functional, stylistic dimensions

Modeling student learning dynamics and error patterns

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning LLMs for realistic student code

Context-aware generation with temporal modeling

Multi-dimensional semantic functional stylistic evaluation

🔎 Similar Papers

Towards Understanding the Characteristics of Code Generation Errors Made by Large Language Models