Robustness of Prompting: Enhancing Robustness of Large Language Models Against Prompting Attacks

📅 2025-06-04

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Large language models (LLMs) exhibit high sensitivity to semantic-preserving input perturbations—such as spelling errors and character transpositions—while existing prompting techniques struggle to simultaneously ensure robustness and task performance. To address this, we propose RoP (Robust Prompting), a two-stage prompting strategy: (1) diversity-driven adversarial perturbation generation coupled with self-correcting prompts for error detection and correction; and (2) dynamic construction of optimal reasoning-guidance prompts based on corrected inputs. RoP is the first prompting framework to decouple error correction from reasoning guidance into a synergistic, training-free design—requiring no fine-tuning or auxiliary models. Evaluated across arithmetic, commonsense, and logical reasoning benchmarks, RoP achieves an average accuracy gain of 23.6% over strong baselines. Under typical perturbation attacks, it retains over 92% of original accuracy—a degradation of less than 3%—demonstrating substantial improvement in prompt-level robustness.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated remarkable performance across various tasks by effectively utilizing a prompting strategy. However, they are highly sensitive to input perturbations, such as typographical errors or slight character order errors, which can substantially degrade their performance. Despite advances in prompting techniques, developing a prompting strategy that explicitly mitigates the negative impact of such perturbations remains an open challenge. To bridge this gap, we propose Robustness of Prompting (RoP), a novel prompting strategy specifically designed to enhance the robustness of LLMs. RoP consists of two stages: Error Correction and Guidance. In the Error Correction stage, RoP applies diverse perturbation methods to generate adversarial examples, which are then used to construct prompts that automatically correct input errors. In the Guidance stage, RoP generates an optimal guidance prompting based on the corrected input, steering the model toward more robust and accurate inferences. Through comprehensive experiments spanning arithmetic, commonsense, and logical reasoning tasks, we demonstrate that RoP significantly improves LLMs' robustness against adversarial perturbations. Notably, it maintains model accuracy with only minimal degradation compared to clean input scenarios, thereby establishing RoP as a practical and effective approach for enhancing LLM robustness in real-world applications.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM robustness against input perturbations

Mitigating performance degradation from adversarial examples

Developing error-correcting prompts for stable outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

RoP strategy enhances LLM robustness against attacks

Error Correction stage fixes input perturbations automatically

Guidance stage steers model to robust accurate inferences

🔎 Similar Papers

No similar papers found.