Adaptive Causal Alignment for High-Confidence Adversarial Training

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This work addresses the vulnerability of high-confidence adversarial training to non-causal background correlations, which can induce overfitting and impair robust generalization. To mitigate this issue, the authors propose HICAT, a novel framework that adaptively evaluates the utility of visual context through a “measure–debias–align” pipeline, enabling precise logit calibration and disentanglement of foreground features. The key innovations include the first identification of the dual role of background signals in high-confidence predictions, a semantic balancing mechanism to prevent feature degradation from indiscriminate suppression, and the introduction of a learnable background bias estimator (LBBE), an adaptive debiasing module, and a foreground logit orthogonal enhancement (FLOE) loss. Experiments demonstrate that HICAT significantly outperforms existing methods on CIFAR-10/100 and ImageNet-1K, effectively narrowing the robust generalization gap while remaining compatible with both CNN and ViT architectures.

📝 Abstract

Inverse adversarial training leverages high-confidence predictions to stabilize robust learning, yet we uncover a critical paradox: high confidence often stems from overfitting to non-causal background correlations rather than intrinsic object semantics. Our investigation reveals that visual context functions as a dual-natured signal, serving as either a necessary supportive prior or a spurious confounder. This insight renders existing blind suppression strategies flawed, as they inevitably lead to severe Feature Loss. To resolve this, we propose High-Confidence Causally Aligned Training (HICAT), a unified framework that establishes a Semantic Equilibrium. Operating on a ``Measure-Debias-Align'' pipeline, HICAT integrates a Learnable Background-Bias Estimator (LBBE) to adaptively diagnose context utility. Guided by this diagnosis, an Adaptive Debiasing mechanism performs surgical logit rectification, complemented by a geometrically grounded Foreground Logit Orthogonal Enhancement (FLOE) loss to enforce rigorous feature disentanglement. Extensive experiments on CIFAR-10, CIFAR-100, and ImageNet-1K demonstrate that HICAT consistently improves over matched baselines across diverse architectures (CNNs and ViTs) while significantly reducing the robust generalization gap.

Problem

Research questions and friction points this paper is trying to address.

adversarial training

causal alignment

background correlation

feature disentanglement

robust generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal Alignment

Adversarial Training

Background Debiasing