Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Offline reinforcement learning (RL) policies exhibit insufficient robustness against action-space perturbations, such as actuator failures. To address this, we propose an offline-to-online adversarial fine-tuning framework—the first to introduce adversarial fine-tuning into offline RL—by injecting controllable action perturbations and designing a performance-aware adaptive curriculum: the perturbation probability is dynamically adjusted via exponential moving average to enhance robustness without degrading original policy performance. Our method integrates offline pretraining, online adversarial fine-tuning, and perturbation injection, requiring no additional online exploration. Evaluated on continuous-control locomotion tasks, it significantly improves disturbance resilience over pure offline baselines, converges faster than from-scratch training, and achieves optimal robustness when fine-tuning and test perturbations are matched. These results validate both the effectiveness and practicality of the proposed framework.

Technology Category

Application Category

📝 Abstract
Offline reinforcement learning enables sample-efficient policy acquisition without risky online interaction, yet policies trained on static datasets remain brittle under action-space perturbations such as actuator faults. This study introduces an offline-to-online framework that trains policies on clean data and then performs adversarial fine-tuning, where perturbations are injected into executed actions to induce compensatory behavior and improve resilience. A performance-aware curriculum further adjusts the perturbation probability during training via an exponential-moving-average signal, balancing robustness and stability throughout the learning process. Experiments on continuous-control locomotion tasks demonstrate that the proposed method consistently improves robustness over offline-only baselines and converges faster than training from scratch. Matching the fine-tuning and evaluation conditions yields the strongest robustness to action-space perturbations, while the adaptive curriculum strategy mitigates the degradation of nominal performance observed with the linear curriculum strategy. Overall, the results show that adversarial fine-tuning enables adaptive and robust control under uncertain environments, bridging the gap between offline efficiency and online adaptability.
Problem

Research questions and friction points this paper is trying to address.

Enhancing robot control robustness against action-space perturbations and actuator faults
Bridging offline policy efficiency with online adaptability through adversarial fine-tuning
Balancing robustness and stability using adaptive curriculum during policy training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial fine-tuning injects perturbations to induce compensatory behavior
Performance-aware curriculum adjusts perturbation probability for balance
Matching fine-tuning and evaluation conditions enhances action-space robustness
S
Shingo Ayabe
Graduate School of Science and Engineering, Chiba University, Chiba, Japan
Hiroshi Kera
Hiroshi Kera
Chiba University
Approximate Computer AlgebraAdversarial Machine LearningMath Transformer
K
Kazuhiko Kawamoto
Graduate School of Informatics, Chiba University, Chiba, Japan