Pay More Attention to the Robustness of Prompt for Instruction Data Mining

📅 2025-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of automatically selecting high-quality data for instruction tuning. We propose a novel data quality assessment paradigm centered on *prompt robustness*. Methodologically, we introduce the first online instruction data mining framework based on adversarial attacks: (1) generating semantically preserved but syntactically perturbed adversarial instructions; (2) modeling their *instruction-following difficulty*; and (3) measuring *embedding-space consistency* between model outputs for original and adversarial instructions—jointly forming our data filtering criterion. Experiments on two mainstream benchmarks demonstrate that our approach significantly improves downstream instruction-tuning performance (average +2.8% task accuracy), validating prompt robustness as an effective and practical proxy for data quality. The method provides a new, automated, and interpretable pathway for curating high-quality instruction data.

Technology Category

Application Category

📝 Abstract
Instruction tuning has emerged as a paramount method for tailoring the behaviors of LLMs. Recent work has unveiled the potential for LLMs to achieve high performance through fine-tuning with a limited quantity of high-quality instruction data. Building upon this approach, we further explore the impact of prompt's robustness on the selection of high-quality instruction data. This paper proposes a pioneering framework of high-quality online instruction data mining for instruction tuning, focusing on the impact of prompt's robustness on the data mining process. Our notable innovation, is to generate the adversarial instruction data by conducting the attack for the prompt of online instruction data. Then, we introduce an Adversarial Instruction-Following Difficulty metric to measure how much help the adversarial instruction data can provide to the generation of the corresponding response. Apart from it, we propose a novel Adversarial Instruction Output Embedding Consistency approach to select high-quality online instruction data. We conduct extensive experiments on two benchmark datasets to assess the performance. The experimental results serve to underscore the effectiveness of our proposed two methods. Moreover, the results underscore the critical practical significance of considering prompt's robustness.
Problem

Research questions and friction points this paper is trying to address.

Explores prompt robustness impact on instruction data selection
Proposes adversarial instruction data generation for quality mining
Introduces metrics to evaluate adversarial data utility
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates adversarial instruction data via prompt attacks
Introduces Adversarial Instruction-Following Difficulty metric
Proposes Adversarial Instruction Output Embedding Consistency method
🔎 Similar Papers
No similar papers found.
Q
Qiang Wang
National University of Defense Technology, Hunan Changsha 410073, China
D
Dawei Feng
National University of Defense Technology, Hunan Changsha 410073, China
X
Xu Zhang
National University of Defense Technology, Hunan Changsha 410073, China
Ao Shen
Ao Shen
Purdue University
machine learning system and architecture
Y
Yang Xu
National University of Defense Technology, Hunan Changsha 410073, China
B
Bo Ding
National University of Defense Technology, Hunan Changsha 410073, China
H
Huaimin Wang
National University of Defense Technology, Hunan Changsha 410073, China