🤖 AI Summary
This work proposes the first precise and complete robustness verification method against data poisoning attacks during neural network training. By unifying adversarial data perturbations, model training dynamics, and test-time evaluation into a single mixed-integer quadratically constrained programming (MIQCP) formulation, the approach computes the worst-case attack impact and provides a rigorous upper bound on performance degradation across all possible poisoning scenarios. This study achieves the first formal certification of robustness against training-time data poisoning, fully characterizing the robustness boundary for small-scale models and thereby demonstrating both the completeness and effectiveness of the proposed method.
📝 Abstract
This work introduces a verification framework that provides both sound and complete guarantees for data poisoning attacks during neural network training. We formulate adversarial data manipulation, model training, and test-time evaluation in a single mixed-integer quadratic programming (MIQCP) problem. Finding the global optimum of the proposed formulation provably yields worst-case poisoning attacks, while simultaneously bounding the effectiveness of all possible attacks on the given training pipeline. Our framework encodes both the gradient-based training dynamics and model evaluation at test time, enabling the first exact certification of training-time robustness. Experimental evaluation on small models confirms that our approach delivers a complete characterization of robustness against data poisoning.