🤖 AI Summary
Reinforcement learning (RL) policies suffer from inefficiency and instability in high-dimensional noisy environments, while existing hierarchical RL (HRL) methods heavily rely on handcrafted priors and strong task-decomposition assumptions. Method: This paper proposes SIDM—a Structure-Informed Decision-Making framework grounded in the structural information principle. SIDM introduces (1) an unsupervised state-action community detection and embedding aggregation mechanism weighted by structural entropy; (2) a two-level skill-learning architecture that automatically identifies high-probability transition paths via shared-path entropy, requiring no expert knowledge; and (3) hierarchical abstraction and skill-driven path probability modeling via optimal coding trees. The framework is agnostic to underlying RL algorithms and supports plug-and-play integration with both single- and multi-agent RL. Results: On multiple challenging benchmarks, SIDM improves policy quality, stability, and sample efficiency by 32.70%, 88.26%, and 64.86%, respectively, outperforming all state-of-the-art methods.
📝 Abstract
Although Reinforcement Learning (RL) algorithms acquire sequential behavioral patterns through interactions with the environment, their effectiveness in noisy and high-dimensional scenarios typically relies on specific structural priors. In this paper, we propose a novel and general Structural Information principles-based framework for effective Decision-Making, namely SIDM, approached from an information-theoretic perspective. This paper presents a specific unsupervised partitioning method that forms vertex communities in the state and action spaces based on their feature similarities. An aggregation function, which utilizes structural entropy as the vertex weight, is devised within each community to obtain its embedding, thereby facilitating hierarchical state and action abstractions. By extracting abstract elements from historical trajectories, a directed, weighted, homogeneous transition graph is constructed. The minimization of this graph's high-dimensional entropy leads to the generation of an optimal encoding tree. An innovative two-layer skill-based learning mechanism is introduced to compute the common path entropy of each state transition as its identified probability, thereby obviating the requirement for expert knowledge. Moreover, SIDM can be flexibly incorporated into various single-agent and multi-agent RL algorithms, enhancing their performance. Finally, extensive evaluations on challenging benchmarks demonstrate that, compared with SOTA baselines, our framework significantly and consistently improves the policy's quality, stability, and efficiency up to 32.70%, 88.26%, and 64.86%, respectively.