🤖 AI Summary
This work addresses the inefficiency of existing hard-label text attack methods, which typically employ an “outside-in” strategy characterized by large search spaces and high query costs. To overcome these limitations, the authors propose PivotAttack, a novel “inside-out” attack framework that introduces pivot sets—combinatorial phrases serving as predictive anchors—to guide adversarial perturbations. By leveraging inter-word dependencies to reconstruct attack trajectories and employing a multi-armed bandit algorithm to efficiently identify and perturb critical pivot sets, PivotAttack effectively induces label flips. Extensive experiments demonstrate that this approach substantially outperforms state-of-the-art methods across both traditional models and large language models, achieving significant improvements in both attack success rate and query efficiency.
📝 Abstract
Existing hard-label text attacks often rely on inefficient "outside-in" strategies that traverse vast search spaces. We propose PivotAttack, a query-efficient "inside-out" framework. It employs a Multi-Armed Bandit algorithm to identify Pivot Sets-combinatorial token groups acting as prediction anchors-and strategically perturbs them to induce label flips. This approach captures inter-word dependencies and minimizes query costs. Extensive experiments across traditional models and Large Language Models demonstrate that PivotAttack consistently outperforms state-of-the-art baselines in both Attack Success Rate and query efficiency.