Multi-level Certified Defense Against Poisoning Attacks in Offline Reinforcement Learning

📅 2025-05-27
🏛️ International Conference on Learning Representations
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Offline reinforcement learning (RL) is vulnerable to data poisoning attacks, with its sequential dependencies further exacerbating robustness risks. To address this, we propose the first provably robust defense framework for offline RL, operating at two hierarchical levels: single-step action selection and global cumulative reward estimation. Our method innovatively integrates differential privacy into certified robustness analysis—enabling rigorous guarantees for both continuous and discrete state-action spaces, as well as stochastic and deterministic environments—thereby substantially broadening applicability. We introduce multi-granularity sensitivity modeling and a novel stability characterization of policy value functions. Experiments demonstrate that our approach limits performance degradation to ≤50% under 7% poisoned data, achieves a certified radius five times larger than prior work, and surpasses baseline tolerance thresholds (0.008%) by orders of magnitude. This constitutes the first robustness solution for offline RL that simultaneously offers strong theoretical guarantees and practical efficacy.

Technology Category

Application Category

📝 Abstract
Similar to other machine learning frameworks, Offline Reinforcement Learning (RL) is shown to be vulnerable to poisoning attacks, due to its reliance on externally sourced datasets, a vulnerability that is exacerbated by its sequential nature. To mitigate the risks posed by RL poisoning, we extend certified defenses to provide larger guarantees against adversarial manipulation, ensuring robustness for both per-state actions, and the overall expected cumulative reward. Our approach leverages properties of Differential Privacy, in a manner that allows this work to span both continuous and discrete spaces, as well as stochastic and deterministic environments -- significantly expanding the scope and applicability of achievable guarantees. Empirical evaluations demonstrate that our approach ensures the performance drops to no more than $50%$ with up to $7%$ of the training data poisoned, significantly improving over the $0.008%$ in prior work~citep{wu_copa_2022}, while producing certified radii that is $5$ times larger as well. This highlights the potential of our framework to enhance safety and reliability in offline RL.
Problem

Research questions and friction points this paper is trying to address.

Defending offline RL against poisoning attacks
Ensuring robustness in actions and cumulative rewards
Extending certified defenses to diverse RL environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends certified defenses for RL poisoning
Leverages Differential Privacy properties
Ensures robustness in diverse environments
🔎 Similar Papers
No similar papers found.
S
Shijie Liu
School of Computing and Information Systems, University of Melbourne, Melbourne, Australia
A
A. C. Cullen
School of Computing and Information Systems, University of Melbourne, Melbourne, Australia
P
Paul Montague
Defence Science and Technology Group, Adelaide, Australia
S
S. Erfani
School of Computing and Information Systems, University of Melbourne, Melbourne, Australia
Benjamin I. P. Rubinstein
Benjamin I. P. Rubinstein
Professor, School of Computing and Information Systems, The University of Melbourne
Artificial IntelligenceDifferential PrivacyAdversarial Machine Learning