🤖 AI Summary
Existing machine learning defense mechanisms primarily focus on the attacks themselves and struggle to identify the attackers, thereby limiting the effectiveness of system-level mitigation strategies. This work proposes the first domain-agnostic framework that shifts the defensive perspective from the attack to the attacker by modeling adversarial behavior and leveraging probabilistic inference to infer attacker characteristics without prior knowledge. Theoretical analysis shows that while attackers cannot be uniquely identified, their attributes can be characterized probabilistically. The framework is applicable across diverse learning models and attack scenarios. Experimental results demonstrate that it not only enhances the precision of exogenous mitigation strategies but also improves the performance of endogenous defense mechanisms such as adversarial regularization.
📝 Abstract
When used in automated decision-making systems, machine learning (ML) models are vulnerable to data-manipulation attacks. Some defense mechanisms (e.g., adversarial regularization) directly affect the ML models while others (e.g., anomaly detection) act within the broader system. In this paper we consider a different task for defending the adversary, focusing on the attacker, rather than the attack. We present and demonstrate a framework for identifying characteristics about the attacker from an observed attack. We prove that, without additional knowledge, the attacker is non-identifiable (multiple potential attackers would perform the same observed attack). To address this challenge, we propose a domain-agnostic framework to identify the most probable attacker. This framework aids the defender in two ways. First, knowledge about the attacker can be leveraged for exogenous mitigation (i.e., addressing the vulnerability by altering the decision-making system outside the learning algorithm and/or limiting the attacker's capability). Second, when implementing defense methods that directly affect the learning process (e.g., adversarial regularization), knowledge of the specific attacker improves performance. We present the details of our framework and illustrate its applicability through specific instantiations on a variety of learners.