Adaptive Multi-Fidelity Reinforcement Learning for Variance Reduction in Engineering Design Optimization

📅 2025-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address high policy-learning variance and unstable convergence in multi-fidelity reinforcement learning (MF-RL) for engineering design optimization—caused by heterogeneous error distributions across fidelity levels—this paper proposes a non-hierarchical, adaptive MF-RL framework. Departing from manual fidelity scheduling, it introduces a novel low-fidelity policy alignment mechanism, integrating policy alignment evaluation, experience transfer reweighting, adaptive sampling control, and co-training to enable dynamic synergy between heterogeneous low-fidelity and high-fidelity models. Evaluated on an octocopter design optimization task, the framework reduces policy-learning variance by 42% and accelerates convergence by 3.1× compared to conventional hierarchical MF-RL methods, while significantly improving solution quality consistency—all without manual scheduling overhead.

Technology Category

Application Category

📝 Abstract
Multi-fidelity Reinforcement Learning (RL) frameworks efficiently utilize computational resources by integrating analysis models of varying accuracy and costs. The prevailing methodologies, characterized by transfer learning, human-inspired strategies, control variate techniques, and adaptive sampling, predominantly depend on a structured hierarchy of models. However, this reliance on a model hierarchy can exacerbate variance in policy learning when the underlying models exhibit heterogeneous error distributions across the design space. To address this challenge, this work proposes a novel adaptive multi-fidelity RL framework, in which multiple heterogeneous, non-hierarchical low-fidelity models are dynamically leveraged alongside a high-fidelity model to efficiently learn a high-fidelity policy. Specifically, low-fidelity policies and their experience data are adaptively used for efficient targeted learning, guided by their alignment with the high-fidelity policy. The effectiveness of the approach is demonstrated in an octocopter design optimization problem, utilizing two low-fidelity models alongside a high-fidelity simulator. The results demonstrate that the proposed approach substantially reduces variance in policy learning, leading to improved convergence and consistent high-quality solutions relative to traditional hierarchical multi-fidelity RL methods. Moreover, the framework eliminates the need for manually tuning model usage schedules, which can otherwise introduce significant computational overhead. This positions the framework as an effective variance-reduction strategy for multi-fidelity RL, while also mitigating the computational and operational burden of manual fidelity scheduling.
Problem

Research questions and friction points this paper is trying to address.

Reduces variance in policy learning with heterogeneous models
Eliminates need for manual tuning of model usage schedules
Improves convergence in engineering design optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive multi-fidelity RL with non-hierarchical models
Dynamic leveraging of heterogeneous low-fidelity models
Alignment-guided targeted learning for variance reduction
🔎 Similar Papers
No similar papers found.