Optimistic critics can empower small actors

📅 2025-06-01

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

In asymmetric Actor-Critic architectures, reducing Actor capacity often induces value underestimation, leading to biased policy sampling, Critic overfitting, and degraded overall performance. This work is the first to systematically identify the root cause as underestimation-driven data inefficiency. We propose an optimistic Critic mechanism: by introducing a regularization-induced positive bias into the Critic’s target, it actively encourages exploration of high-value regions and mitigates underestimation bias. Our method incurs no additional parameters for the Actor and integrates seamlessly into standard training pipelines. Evaluated on multiple continuous-control benchmarks, it significantly accelerates convergence and improves final performance of compact Actor models, while effectively suppressing Critic overfitting. These results validate the critical role of optimistic value estimation in stabilizing training of asymmetric architectures and establish a scalable new paradigm for lightweight deep reinforcement learning.

Technology Category

Application Category

📝 Abstract

Actor-critic methods have been central to many of the recent advances in deep reinforcement learning. The most common approach is to use symmetric architectures, whereby both actor and critic have the same network topology and number of parameters. However, recent works have argued for the advantages of asymmetric setups, specifically with the use of smaller actors. We perform broad empirical investigations and analyses to better understand the implications of this and find that, in general, smaller actors result in performance degradation and overfit critics. Our analyses suggest poor data collection, due to value underestimation, as one of the main causes for this behavior, and further highlight the crucial role the critic can play in alleviating this pathology. We explore techniques to mitigate the observed value underestimation, which enables further research in asymmetric actor-critic methods.

Problem

Research questions and friction points this paper is trying to address.

Investigates performance degradation with smaller actors in actor-critic methods

Identifies value underestimation as a cause of poor data collection

Explores techniques to mitigate value underestimation in asymmetric setups

Innovation

Methods, ideas, or system contributions that make the work stand out.

Asymmetric actor-critic architectures for better performance

Smaller actors reduce overfitting in critic networks

Mitigating value underestimation to improve data collection

🔎 Similar Papers

No similar papers found.