🤖 AI Summary
This work addresses the challenge of defensive resource allocation in security games under ecological settings where adversary behavior is unknown and observations are limited, rendering traditional equilibrium-based approaches ineffective. The authors propose HERDS, an algorithm grounded in the Follow-the-Perturbed-Leader framework that performs online learning via semi-bandit feedback over combinatorial action spaces to minimize regret. Key innovations include dynamic partitioning of exploration and exploitation budgets, adaptive reward estimation under unobservable attack entry points, and a model-free mechanism that makes no assumptions about adversary behavior. Evaluated on human-elephant conflict mitigation, HERDS reduces regret by 15–45% and crop loss by 40–50% compared to FPL-UE, while achieving convergence within 40–50 rounds—substantially faster than the 60–80 rounds required by baseline methods.
📝 Abstract
We introduce an online learning algorithm for computing adaptive resource allocation policies against strategic ecological adversaries with unknown behavioral models and partial observability. Our setting addresses a fundamental limitation of security games: when adversary behavior cannot be modeled a priori, classical equilibrium-based approaches fail. We formulate the problem as regret minimization in a combinatorial action space with semi-bandit feedback, where payoffs are non-stationary and interdependent across targets. Our algorithm, named HERDS (Human-Elephant conflict mitigation through Resource Deployment for Strategic guarding), extends Follow-the-Perturbed-Leader (FPL) with three innovations: (1) simultaneous exploration-exploitation through dynamic budget partitioning driven by observed losses, (2) adaptive payoff estimation under confounded observations where attack entry points are unidentifiable, and (3) model-agnostic learning that provides regret guarantees without behavioral assumptions. We demonstrate our framework on Human-Elephant Conflict mitigation, a domain where intelligent ecological adversaries exhibit strategic behavior (optimal foraging, spatial memory, adaptive evasion) yet lack tractable behavioral models. Experiments using an Agent-Based Model calibrated with elephant movement data demonstrate 15--45% regret reduction versus Follow-the-Perturbed-Leader with Uniform-Exploration (FPL-UE), 40--50% crop damage reduction against adaptive adversaries, and convergence in 40--50 rounds versus 60--80 for baselines.