🤖 AI Summary
Differential Attention (DA) enhances task focus but introduces structural fragility: its subtraction-based mechanism suppresses contextual hallucinations yet significantly increases gradient norms and local Lipschitz constants, degrading adversarial robustness.
Method: We conduct theoretical analysis and empirical evaluation—using ViT/DiffViT and CLIP/DiffCLIP across five benchmarks—assessing adversarial vulnerability and gradient dynamics under diverse attacks.
Contribution/Results: We identify negative gradient alignment as the core mechanism underlying DA’s sensitivity. We propose depth-dependent noise cancellation to mitigate small perturbations and, for the first time, observe a robustness crossover phenomenon across stacked layers. Experiments show DA achieves higher attack success rates, more frequent gradient opposition, and stronger local sensitivity than standard attention—revealing a fundamental trade-off between task focus and adversarial robustness.
📝 Abstract
Differential Attention (DA) has been proposed as a refinement to standard attention, suppressing redundant or noisy context through a subtractive structure and thereby reducing contextual hallucination. While this design sharpens task-relevant focus, we show that it also introduces a structural fragility under adversarial perturbations. Our theoretical analysis identifies negative gradient alignment-a configuration encouraged by DA's subtraction-as the key driver of sensitivity amplification, leading to increased gradient norms and elevated local Lipschitz constants. We empirically validate this Fragile Principle through systematic experiments on ViT/DiffViT and evaluations of pretrained CLIP/DiffCLIP, spanning five datasets in total. These results demonstrate higher attack success rates, frequent gradient opposition, and stronger local sensitivity compared to standard attention. Furthermore, depth-dependent experiments reveal a robustness crossover: stacking DA layers attenuates small perturbations via depth-dependent noise cancellation, though this protection fades under larger attack budgets. Overall, our findings uncover a fundamental trade-off: DA improves discriminative focus on clean inputs but increases adversarial vulnerability, underscoring the need to jointly design for selectivity and robustness in future attention mechanisms.