π€ AI Summary
This work investigates structural redundancy and learning efficiency in zero-shot task composition within Boolean Task Algebra (BTA). By analyzing the space of optimal extended Q-functions in deterministic Markov decision processes, the study reveals that this space is fully determined solely by the universal and empty tasks. Building on this insight, the authors propose a strategy reconstruction method based on logical operations over goal sets, enabling the composition of new policies without requiring additional learning of base tasks. Theoretical analysis demonstrates the inherent redundancy of base tasks in BTA and clarifies that this property does not necessarily hold in stochastic environments, thereby delineating the complexity boundary of task composition. Empirical evaluations across tabular, visual, function approximation, and continuous control settings confirm that the proposed approach significantly reduces both learning and composition overhead while preserving policy performance, with no observed gains from increasing the number of base tasks.
π Abstract
The Boolean Task Algebra (BTA) provides a principled framework for zero-shot task composition in reinforcement learning by equipping goal-reaching tasks with Boolean operations. We revisit its structural assumptions and formalize a collapse in the space of optimal extended Q-value functions: in deterministic MDPs, every such function is fully determined by the universal and empty tasks. This makes the logarithmic set of base tasks proposed in the original BTA formulation redundant. Building on this observation, we introduce a goal-set-based composition method that performs logical operations on goal sets and reconstructs composed value functions by selecting slices from the universal and empty value functions. This reduces learning costs for standard BTA and reduces composition time for both BTA and Skill Machines, while preserving policy performance. Experiments across tabular, visual, function-approximation, and continuous-control domains show that learning additional base tasks does not yield better performance. Finally, we study the stochastic setting and provide a counterexample showing that this collapse need not hold, that is, optimal composition may require accounting for exponentially many policies in the number of goals. Code is available at https://github.com/EduardoTerres/bta_paper.