🤖 AI Summary
This work addresses the lack of a general mechanism in reinforcement learning for dynamically adjusting the granularity of state-action abstraction, which often leads to a trade-off between task simplification and preservation of critical information. The authors propose an adaptive soft abstraction method grounded in rate-distortion theory, uniquely integrating rate-distortion optimization with bisimulation metrics to enable continuously tunable abstraction over both state and action spaces. By decomposing value function error into learning and abstraction components via performance certificates, they introduce an error-driven dynamic refinement strategy that triggers abstraction optimization when these two error sources become comparable. Empirical results demonstrate that the approach maintains near-optimal performance across multiple tabular environments, even under substantial lossy compression of the state-action space.
📝 Abstract
When learning to walk, infants seem to address a coarse version of the problem first - stay upright, reach the caregiver - and refine it only when further practice at that resolution stops paying off. Reinforcement learning offers multiple techniques for building simple versions of complex tasks, but lacks general principles for how to dynamically adjust the granularity of these abstractions during learning. This paper proposes one such principle: refine the abstraction as soon as the learning error within it becomes comparable to the error induced by the abstraction itself. Here, we investigate one way of formalising this principle via a performance certificate that decomposes value error into two terms: a learning error bound captured by a Bellman residual, and an abstraction error bound given by a bisimulation metric. The resulting switching strategy is implemented by soft state-action abstractions built from rate-distortion principles, whose resolution along state and action axes can be continuously adjusted. We validate this construction in a range of tabular settings, showing that near-optimal performance can be achieved under substantial lossy compression of state and action information.