🤖 AI Summary
This study addresses the vulnerability of large language model (LLM)-based automated scoring systems to prompt injection attacks, which pose significant threats to the fairness and reliability of educational assessment. It presents the first systematic investigation into the effectiveness of such attacks under rubric-based evaluation scenarios and evaluates the protective capabilities of existing defense mechanisms. By constructing a simulated attack-and-defense experimental framework, the research demonstrates that current LLM scoring systems are highly susceptible to these adversarial manipulations, while prevailing defense strategies offer only limited mitigation. The findings expose critical security weaknesses in AI-driven educational tools and provide empirical grounding and urgent safety considerations for the development of more robust and trustworthy automated scoring systems.
📝 Abstract
The emergence of large language models (LLMs) has significantly accelerated recent research on LLM-based automatic grading (AG) systems. Benefiting from the strong instruction-following capabilities and broad prior knowledge of LLMs, educators can deploy AG systems across diverse tasks using only natural language rubrics while achieving satisfactory grading performance. Despite these advantages, new security concerns may also arise. In particular, prompt injection (PI) attacks have recently become a major threat to LLM-based applications. In the context of AG, attackers can potentially exploit PI vulnerabilities to manipulate grading systems into assigning artificially high scores regardless of the actual answer quality. Such behavior poses serious risks to the fairness, reliability, and integrity of educational assessment. In this work, we study PI attacks in AG systems, and systematically investigate the effectiveness of such attacks in educational scenarios. We further evaluate the effectiveness of existing defensive strategies against these attacks. Through comprehensive experiments under rubric-based grading settings, we demonstrate that current LLM-based AG systems remain highly vulnerable to PI attacks. We hope that our findings raise awareness of this emerging threat and motivate future research toward secure, robust, and trustworthy LLM-based educational systems.