Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments

๐Ÿ“… 2025-11-30
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In real-world RL environments, group symmetries are often *locally* broken due to dynamics, actuation constraints, or reward designโ€”causing conventional group-invariant Bellman backups to propagate local errors globally, thereby degrading value estimation accuracy and generalization. To address this, we propose Partially Equivariant Reinforcement Learning (PERL), introducing the notion of a Partially Invariant Markov Decision Process (PI-MDP). PERL performs group-invariant backups only within symmetry-preserving regions, while reverting to standard backups in symmetry-broken regions to suppress error propagation. Based on this principle, we instantiate PE-DQN for discrete-action domains and PE-SAC for continuous-control settings. Evaluated on benchmarks including Grid-World, locomotion control, and robotic manipulation, PERL achieves significantly improved sample efficiency and robustness over both standard RL and state-of-the-art group-invariant baselines.

Technology Category

Application Category

๐Ÿ“ Abstract
Group symmetries provide a powerful inductive bias for reinforcement learning (RL), enabling efficient generalization across symmetric states and actions via group-invariant Markov Decision Processes (MDPs). However, real-world environments almost never realize fully group-invariant MDPs; dynamics, actuation limits, and reward design usually break symmetries, often only locally. Under group-invariant Bellman backups for such cases, local symmetry-breaking introduces errors that propagate across the entire state-action space, resulting in global value estimation errors. To address this, we introduce Partially group-Invariant MDP (PI-MDP), which selectively applies group-invariant or standard Bellman backups depending on where symmetry holds. This framework mitigates error propagation from locally broken symmetries while maintaining the benefits of equivariance, thereby enhancing sample efficiency and generalizability. Building on this framework, we present practical RL algorithms -- Partially Equivariant (PE)-DQN for discrete control and PE-SAC for continuous control -- that combine the benefits of equivariance with robustness to symmetry-breaking. Experiments across Grid-World, locomotion, and manipulation benchmarks demonstrate that PE-DQN and PE-SAC significantly outperform baselines, highlighting the importance of selective symmetry exploitation for robust and sample-efficient RL.
Problem

Research questions and friction points this paper is trying to address.

Addresses global value errors from local symmetry-breaking in RL
Introduces PI-MDP for selective group-invariant Bellman backups
Enhances sample efficiency and robustness in symmetry-breaking environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selectively applies group-invariant or standard Bellman backups
Introduces Partially group-Invariant MDP (PI-MDP) framework
Develops PE-DQN and PE-SAC algorithms for discrete and continuous control
๐Ÿ”Ž Similar Papers
No similar papers found.
J
Junwoo Chang
School of Mechanical Engineering, Yonsei University, Seoul, South Korea
Minwoo Park
Minwoo Park
Department of Artificial Intelligence, Yonsei University, Seoul, South Korea
Joohwan Seo
Joohwan Seo
Mechanical Engineering, UC Berkeley
Nonlinear controlGeometric controlLearningRobotics
R
Roberto Horowitz
Department of Mechanical Engineering, University of California, Berkeley, United States
J
Jongmin Lee
Department of Artificial Intelligence, Yonsei University, Seoul, South Korea
Jongeun Choi
Jongeun Choi
Professor of Mechanical Engineering, Yonsei University
Machine LearningRobot LearningSystems and ControlAI in Healthcare