🤖 AI Summary
This work addresses the limited certified robustness of existing natural language processing models against word substitution attacks, which typically account only for first-order sensitivity while neglecting second-order curvature information. To overcome this limitation, the authors propose Smoothed Growth Bound Tensors (S-GBT), a novel framework that incorporates quadratic terms of output variation into certified robustness analysis for the first time. By imposing element-wise constraints on the Hessian matrix, S-GBT constructs a second-order robustness bound and introduces a joint regularization term that simultaneously optimizes both first- and second-order sensitivities during training. Implemented on LSTM and CNN architectures, the method integrates Hessian-bound estimation and second-order Taylor expansion directly into the training objective. Experiments demonstrate that S-GBT achieves up to a 23.4% improvement in certified robust accuracy across multiple benchmark datasets while maintaining strong clean accuracy.
📝 Abstract
Despite recent progress in Natural Language Processing (NLP), models remain vulnerable to word substitution attacks. Most existing defenses focus on first order sensitivity and measure how much the output changes when the input is slightly perturbed. However, they ignore how this sensitivity evolves, which is described by curvature. When gradients vary sharply, models can still fail. This paper introduces the Smooth Growth Bound Tensor (S-GBT), a second order method that bounds the Hessian element-wise, for which we provide formal theoretical proofs on the resulting robustness bounds. A regularization term is added during training to minimize these bounds. This yields tighter certified robustness against word substitution attacks. The change in the output under word substitution is bounded by both a linear term and a quadratic term. S-GBT is derived for two architectures: Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN). The method is integrated directly into the training objective. Its effectiveness is evaluated on multiple benchmark datasets. The results show that combining first and second order regularization improves certified robust accuracy by up to 23.4% compared to prior methods, while clean accuracy remains competitive. These findings indicate that controlling both the gradient and its variation is a promising direction for building more robust models.