SPOOF: Simple Pixel Operations for Out-of-Distribution Fooling

📅 2025-12-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Deep neural networks (DNNs) often exhibit overconfidence on out-of-distribution inputs, revealing critical robustness deficiencies. To expose such vulnerabilities, we propose SPOOF—a minimalist, query-efficient black-box adversarial attack framework that generates high-confidence misclassifications via minimal pixel perturbations and low query overhead. SPOOF jointly leverages evolutionary optimization over CPPN-based implicit representations and direct pixel-level refinement, enabling near-100% attack success rates even on state-of-the-art vision transformers (ViTs). Extensive experiments demonstrate that prevalent retraining-based defenses offer limited resistance against SPOOF. Crucially, SPOOF is the first to systematically demonstrate that lightweight black-box attacks can breach the trust boundaries of modern vision models at extremely low computational cost—requiring only tens of queries and seconds of runtime. This work establishes a new practical benchmark for evaluating real-world model robustness and provides an accessible, scalable tool for stress-testing deployed vision systems.

Technology Category

Application Category

📝 Abstract

Deep neural networks (DNNs) excel across image recognition tasks, yet continue to exhibit overconfidence on inputs that bear no resemblance to natural images. Revisiting the "fooling images" work introduced by Nguyen et al. (2015), we re-implement both CPPN-based and direct-encoding-based evolutionary fooling attacks on modern architectures, including convolutional and transformer classifiers. Our re-implementation confirm that high-confidence fooling persists even in state-of-the-art networks, with transformer-based ViT-B/16 emerging as the most susceptible--achieving near-certain misclassifications with substantially fewer queries than convolution-based models. We then introduce SPOOF, a minimalist, consistent, and more efficient black-box attack generating high-confidence fooling images. Despite its simplicity, SPOOF generates unrecognizable fooling images with minimal pixel modifications and drastically reduced compute. Furthermore, retraining with fooling images as an additional class provides only partial resistance, as SPOOF continues to fool consistently with slightly higher query budgets--highlighting persistent fragility of modern deep classifiers.

Problem

Research questions and friction points this paper is trying to address.

Addresses overconfidence in DNNs on non-natural images

Introduces SPOOF for efficient black-box fooling attacks

Highlights persistent fragility despite retraining defenses

Innovation

Methods, ideas, or system contributions that make the work stand out.

Simple pixel operations for fooling images

Black-box attack with minimal pixel modifications

Efficient fooling with drastically reduced compute

🔎 Similar Papers

Improving Synthetic Image Detection Towards Generalization: An Image Transformation Perspective

2024-08-13arXiv.orgCitations: 0

Authors to Follow