Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions

📅 2025-08-16

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

This work exposes critical prompt injection robustness vulnerabilities in large language models (LLMs) when deployed as automated evaluators (“LLM-as-a-judge”), particularly against hidden malicious instructions embedded in real-world PDF documents. Method: We design a minimal yet effective evaluation paradigm: under zero-shot settings, mainstream LLMs are tasked with solving elementary arithmetic questions (e.g., “3 + 2 = ?”) presented as multiple-choice or true/false items—despite adversarial PDF formatting that subtly injects misleading prompts. Contribution/Results: Experiments reveal that even on syntactically unambiguous, low-complexity tasks, all tested models exhibit significant susceptibility to implicit prompt interference, with error rates sharply increasing—some miscomputing basic operations like 3 + 2. This is the first study to instantiate prompt injection attacks within authentic PDF documents, empirically demonstrating the severe fragility of LLM-based evaluators under input noise. The findings provide both a critical caution and a reproducible benchmark for building trustworthy AI evaluation frameworks.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have recently demonstrated strong emergent abilities in complex reasoning and zero-shot generalization, showing unprecedented potential for LLM-as-a-judge applications in education, peer review, and data quality evaluation. However, their robustness under prompt injection attacks, where malicious instructions are embedded into the content to manipulate outputs, remains a significant concern. In this work, we explore a frustratingly simple yet effective attack setting to test whether LLMs can be easily misled. Specifically, we evaluate LLMs on basic arithmetic questions (e.g., "What is 3 + 2?") presented as either multiple-choice or true-false judgment problems within PDF files, where hidden prompts are injected into the file. Our results reveal that LLMs are indeed vulnerable to such hidden prompt injection attacks, even in these trivial scenarios, highlighting serious robustness risks for LLM-as-a-judge applications.

Problem

Research questions and friction points this paper is trying to address.

Assessing LLM vulnerability to hidden prompt injection attacks

Testing robustness on simple arithmetic multiple-choice questions

Evaluating security risks in LLM-as-a-judge applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hidden prompt injection in PDFs

Testing LLMs on simple arithmetic

Revealing vulnerability in trivial scenarios

🔎 Similar Papers

No similar papers found.