🤖 AI Summary
Adversarial patches trained solely in the digital domain suffer severe performance degradation when physically deployed due to digital-to-physical domain mismatch. To address this, we propose PAPLA—a novel framework enabling end-to-end adversarial patch learning directly in the physical domain: patches are generated, projected, and optimized in real-world scenes using a projector, eliminating cross-domain transfer discrepancies. Methodologically, PAPLA introduces a differentiable physical rendering model that explicitly incorporates environmental perturbations—including illumination, viewpoint, and surface material properties—and integrates measurement-driven gradient approximation with environment-robust optimization. Experiments demonstrate that PAPLA significantly outperforms conventional “digital training + physical sticker” approaches across diverse real-world settings, including laboratory environments and outdoor scenarios (e.g., vehicles and traffic signs). It enables real-time evasion attacks against detectors such as YOLO and, under specific conditions, fully mitigates cross-domain failure.
📝 Abstract
The traditional learning process of patch-based adversarial attacks, conducted in the digital domain and then applied in the physical domain (e.g., via printed stickers), may suffer from reduced performance due to adversarial patches' limited transferability from the digital domain to the physical domain. Given that previous studies have considered using projectors to apply adversarial attacks, we raise the following question: can adversarial learning (i.e., patch generation) be performed entirely in the physical domain with a projector? In this work, we propose the Physical-domain Adversarial Patch Learning Augmentation (PAPLA) framework, a novel end-to-end (E2E) framework that converts adversarial learning from the digital domain to the physical domain using a projector. We evaluate PAPLA across multiple scenarios, including controlled laboratory settings and realistic outdoor environments, demonstrating its ability to ensure attack success compared to conventional digital learning-physical application (DL-PA) methods. We also analyze the impact of environmental factors, such as projection surface color, projector strength, ambient light, distance, and angle of the target object relative to the camera, on the effectiveness of projected patches. Finally, we demonstrate the feasibility of the attack against a parked car and a stop sign in a real-world outdoor environment. Our results show that under specific conditions, E2E adversarial learning in the physical domain eliminates the transferability issue and ensures evasion by object detectors. Finally, we provide insights into the challenges and opportunities of applying adversarial learning in the physical domain and explain where such an approach is more effective than using a sticker.