🤖 AI Summary
This paper challenges the necessity of value representation in reinforcement learning (RL), arguing that even policy gradient (PG) methods—which avoid explicit action-value computation—implicitly rely on value concepts due to underlying modeling assumptions. Method: The authors systematically relax standard RL assumptions—Markovian dynamics, exponential discounting, risk neutrality, and full observability—and analyze how each affects the definability and role of value functions. Contribution/Results: They establish that value representation is not algorithmically avoidable (e.g., PG vs. value-based methods) but stems fundamentally from the optimization objective’s structural assumptions. Consequently, they reconceptualize “model” in cognitive science beyond parametric statistical complexity to incorporate computational complexity and algorithmic properties. This work is the first to identify objective-level assumptions—not algorithmic choices—as the source of value’s inevitability, and proposes a novel evaluation paradigm for cognitive models that jointly accounts for statistical and computational dimensions.
📝 Abstract
Action-values play a central role in popular Reinforcement Learing (RL) models of behavior. Yet, the idea that action-values are explicitly represented has been extensively debated. Critics had therefore repeatedly suggested that policy-gradient (PG) models should be favored over value-based (VB) ones, as a potential solution for this dilemma. Here we argue that this solution is unsatisfying. This is because PG methods are not, in fact,"Value-free"-- while they do not rely on an explicit representation of Value for acting (stimulus-response mapping), they do require it for learning. Hence, switching to PG models is, per se, insufficient for eliminating Value from models of behavior. More broadly, the requirement for a representation of Value stems from the underlying assumptions regarding the optimization objective posed by the standard RL framework, not from the particular algorithm chosen to solve it. Previous studies mostly took these standard RL assumptions for granted, as part of their conceptualization or problem modeling, while debating the different methods used to optimize it (i.e., PG or VB). We propose that, instead, the focus of the debate should shift to critically evaluating the underlying modeling assumptions. Such evaluation is particularly important from an experimental perspective. Indeed, the very notion of Value must be reconsidered when standard assumptions (e.g., risk neutrality, full-observability, Markovian environment, exponential discounting) are relaxed, as is likely in natural settings. Finally, we use the Value debate as a case study to argue in favor of a more nuanced, algorithmic rather than statistical, view of what constitutes"a model"in cognitive sciences. Our analysis suggests that besides"parametric"statistical complexity, additional aspects such as computational complexity must also be taken into account when evaluating model complexity.