Survival Reinforcement Learning: Toward Scalable Self-Supervised RL

📅 2026-05-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

171K/year
🤖 AI Summary
This work addresses the performance bottleneck in self-supervised contrastive reinforcement learning for long-horizon goal-directed planning, which arises from the uniformity-tolerance dilemma inherent in contrastive losses. To overcome this limitation, the paper proposes Survival Reinforcement Learning (SRL), a novel framework based on online classification that abandons the conventional contrastive learning architecture. Instead, SRL achieves self-supervised policy learning by maximizing the agent’s residence time in the target state and incorporates an online classification mechanism alongside a survival value function to effectively mitigate the “bang-bang” control problem. Notably, SRL is the first to successfully apply a classification paradigm to high-dimensional, long-horizon reinforcement learning, achieving state-of-the-art performance among contrastive reinforcement learning methods across multiple robotic benchmark tasks, with 2–8× performance gains on stable locomotion tasks.
📝 Abstract
While self-supervised Contrastive Reinforcement Learning (CRL) has shown remarkable depth-scaling capabilities, successfully using networks over 64 layers, scaled CRL still struggles with long-horizon goal-conditioned planning due to the uniformity-tolerance dilemma inherent in contrastive losses. We introduce Survival Reinforcement Learning (SRL), an online classification-based alternative that extends the survival value learning framework by maximizing the agent's dwell time at target goals. SRL bypasses the structural constraints of CRL and mitigates the "bang-bang" control solutions inherent to survival frameworks, which often induce undesirable behavior in complex dynamical systems. Evaluated across diverse robotic benchmarks, scaled SRL matches state-of-the-art CRL on manipulation tasks and outperforms it by 2x to 8x on stable, long-horizon locomotion tasks. Our results provide strong additional evidence that classification-based methods may serve as a key primitive in the broader effort to scale reinforcement learning.
Problem

Research questions and friction points this paper is trying to address.

Contrastive Reinforcement Learning
long-horizon planning
uniformity-tolerance dilemma
bang-bang control
survival value learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Survival Reinforcement Learning
Classification-based RL
Contrastive Reinforcement Learning
Long-horizon Planning
Scalable RL
🔎 Similar Papers
2023-06-06International Conference on Learning RepresentationsCitations: 4