Online Learning of Strategic Defense against Ecological Adversaries under Partial Observability with Semi-Bandit Feedback

📅 2026-03-12

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This work addresses the challenge of defensive resource allocation in security games under ecological settings where adversary behavior is unknown and observations are limited, rendering traditional equilibrium-based approaches ineffective. The authors propose HERDS, an algorithm grounded in the Follow-the-Perturbed-Leader framework that performs online learning via semi-bandit feedback over combinatorial action spaces to minimize regret. Key innovations include dynamic partitioning of exploration and exploitation budgets, adaptive reward estimation under unobservable attack entry points, and a model-free mechanism that makes no assumptions about adversary behavior. Evaluated on human-elephant conflict mitigation, HERDS reduces regret by 15–45% and crop loss by 40–50% compared to FPL-UE, while achieving convergence within 40–50 rounds—substantially faster than the 60–80 rounds required by baseline methods.

Technology Category

Application Category

📝 Abstract

We introduce an online learning algorithm for computing adaptive resource allocation policies against strategic ecological adversaries with unknown behavioral models and partial observability. Our setting addresses a fundamental limitation of security games: when adversary behavior cannot be modeled a priori, classical equilibrium-based approaches fail. We formulate the problem as regret minimization in a combinatorial action space with semi-bandit feedback, where payoffs are non-stationary and interdependent across targets. Our algorithm, named HERDS (Human-Elephant conflict mitigation through Resource Deployment for Strategic guarding), extends Follow-the-Perturbed-Leader (FPL) with three innovations: (1) simultaneous exploration-exploitation through dynamic budget partitioning driven by observed losses, (2) adaptive payoff estimation under confounded observations where attack entry points are unidentifiable, and (3) model-agnostic learning that provides regret guarantees without behavioral assumptions. We demonstrate our framework on Human-Elephant Conflict mitigation, a domain where intelligent ecological adversaries exhibit strategic behavior (optimal foraging, spatial memory, adaptive evasion) yet lack tractable behavioral models. Experiments using an Agent-Based Model calibrated with elephant movement data demonstrate 15--45% regret reduction versus Follow-the-Perturbed-Leader with Uniform-Exploration (FPL-UE), 40--50% crop damage reduction against adaptive adversaries, and convergence in 40--50 rounds versus 60--80 for baselines.

Problem

Research questions and friction points this paper is trying to address.

online learning

strategic adversaries

partial observability

semi-bandit feedback

ecological conflict

Innovation

Methods, ideas, or system contributions that make the work stand out.

online learning

semi-bandit feedback

partial observability