🤖 AI Summary
This study addresses the vulnerability of machine learning–based ballot classifiers to physical adversarial attacks, which poses a threat to electoral integrity. It pioneers the integration of probabilistic election outcome models with physical adversarial example evaluation, moving beyond prior work that focused exclusively on razor-thin margins. The authors generate and scan 144,000 physically realizable adversarial ballots using six attack methods—including multiple ℓp-constrained variants of APGD and PGD—and systematically assess their effectiveness across four classifiers in both digital and physical domains. The findings reveal that the most effective attacks in the physical domain are ℓ1- or ℓ2-based (depending on the model), contrasting with the ℓ2/ℓ∞ dominance observed in the digital domain. Moreover, the work precisely quantifies the minimum number of adversarial ballots required to flip an election outcome.
📝 Abstract
Developments in the machine learning voting domain have shown both promising results and risks. Trained models perform well on ballot classification tasks (> 99% accuracy) but are at risk from adversarial example attacks that cause misclassifications. In this paper, we analyze an attacker who seeks to deploy adversarial examples against machine learning ballot classifiers to compromise a U.S. election. We first derive a probabilistic framework for determining the number of adversarial example ballots that must be printed to flip an election, in terms of the probability of each candidate winning and the total number of ballots cast. Second, it is an open question as to which type of adversarial example is most effective when physically printed in the voting domain. We analyze six different types of adversarial example attacks: l_infinity-APGD, l2-APGD, l1-APGD, l0 PGD, l0 + l_infinity PGD, and l0 + sigma-map PGD. Our experiments include physical realizations of 144,000 adversarial examples through printing and scanning with four different machine learning models. We empirically demonstrate an analysis gap between the physical and digital domains, wherein attacks most effective in the digital domain (l2 and l_infinity) differ from those most effective in the physical domain (l1 and l2, depending on the model). By unifying a probabilistic election framework with digital and physical adversarial example evaluations, we move beyond prior close race analyses to explicitly quantify when and how adversarial ballot manipulation could alter outcomes.