Structural Information-based Hierarchical Diffusion for Offline Reinforcement Learning

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion-based offline reinforcement learning methods suffer from poor task adaptability and limited decision-making flexibility due to rigid two-level architectures and single-timescale diffusion processes. To address this, we propose a state-community-structured adaptive hierarchical diffusion framework. Our method leverages community detection on state-transition graphs to extract topological structure, which serves as a conditional signal for dynamically constructing multi-timescale diffusion hierarchies. We further introduce structural entropy regularization to enhance policy stability, exploration efficiency, and robustness against data distribution shifts. By unifying diffusion generative modeling, hierarchical temporal abstraction, and graph-structured perception, the framework enables effective modeling of long-horizon, sparse-reward trajectories. Empirical evaluation on multiple challenging offline RL benchmarks demonstrates substantial improvements over state-of-the-art methods, achieving superior decision-making performance and enhanced cross-scenario generalization.

Technology Category

Application Category

📝 Abstract
Diffusion-based generative methods have shown promising potential for modeling trajectories from offline reinforcement learning (RL) datasets, and hierarchical diffusion has been introduced to mitigate variance accumulation and computational challenges in long-horizon planning tasks. However, existing approaches typically assume a fixed two-layer diffusion hierarchy with a single predefined temporal scale, which limits adaptability to diverse downstream tasks and reduces flexibility in decision making. In this work, we propose SIHD, a novel Structural Information-based Hierarchical Diffusion framework for effective and stable offline policy learning in long-horizon environments with sparse rewards. Specifically, we analyze structural information embedded in offline trajectories to construct the diffusion hierarchy adaptively, enabling flexible trajectory modeling across multiple temporal scales. Rather than relying on reward predictions from localized sub-trajectories, we quantify the structural information gain of each state community and use it as a conditioning signal within the corresponding diffusion layer. To reduce overreliance on offline datasets, we introduce a structural entropy regularizer that encourages exploration of underrepresented states while avoiding extrapolation errors from distributional shifts. Extensive evaluations on challenging offline RL tasks show that SIHD significantly outperforms state-of-the-art baselines in decision-making performance and demonstrates superior generalization across diverse scenarios.
Problem

Research questions and friction points this paper is trying to address.

Adaptive hierarchical diffusion for long-horizon offline RL
Structural information replaces reward-based trajectory modeling
Mitigating distributional shifts via entropy-regularized exploration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptively constructs diffusion hierarchy using structural information
Quantifies structural information gain as conditioning signal
Introduces structural entropy regularizer to reduce overreliance
Xianghua Zeng
Xianghua Zeng
Beihang University
Structural Information PrinciplesReinforcement Learning
H
Hao Peng
State Key Laboratory of Software Development Environment, Beihang University, Beijing, China
A
Angsheng Li
State Key Laboratory of Software Development Environment, Beihang University, Beijing, China; Zhongguancun Laboratory, Beijing, China
Y
Yicheng Pan
State Key Laboratory of Software Development Environment, Beihang University, Beijing, China