Beyond Pure Sampling: Hybrid Optimization Mechanisms for Non-Convex Model Predictive Control

📅 2026-05-30

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work addresses the challenge of navigating complex cost landscapes in non-convex model predictive control, where nonlinear dynamics and multiple obstacles often trap gradient-based methods in suboptimal local minima. To overcome this limitation, we propose a Maximum Entropy Differential Dynamic Programming (ME-DDP) framework that integrates deterministic optimization with entropy-maximizing sampling. Our approach employs a two-stage mechanism: it first performs local gradient-based refinement via DDP and then leverages the inverse Hessian of the action-value function to guide policy sampling, enabling escape from local minima and balancing global exploration with local exploitation. We develop three ME-DDP variants, elucidate their theoretical connections to Model Predictive Path Integral (MPPI) control, and demonstrate superior performance across four navigation benchmarks—achieving higher success rates in high-dimensional systems, outperforming MPPI in low-dimensional settings, and exhibiting robustness in real-world quadrotor experiments through dense obstacle fields.

📝 Abstract

This paper investigates the optimization mechanisms of non-convex Model Predictive Control (MPC) using the Maximum Entropy Differential Dynamic Programming (ME-DDP) framework. Navigating non-convex cost landscapes induced by nonlinear dynamics, multiple obstacles, etc. remains a fundamental challenge in robotics, where gradient-based methods frequently converge to suboptimal local minima. We demonstrate a dual-step optimization mechanism designed to overcome these traps. (1) an initial phase of using DDP to exploit the gradient of the cost landscape, followed by (2) disruption of the optimization via sampling from policies characterized by the inverse Hessian of the action-value function. We provide a rigorous analysis of this sampling mechanism of three ME-DDP variants: Unimodal Gaussian ME-DDP, Multimodal Gaussian ME-DDP, and Stein Variational DDP. Furthermore, with navigation tasks of four robotic systems under cluttered environments, we conduct extensive benchmarking of three variants of the ME-DDP, against deterministic DDP, and one of the most successful sampling-based schemes, Model Predictive Path Integral (MPPI) control with three policy parameterizations and update laws that correspond to those of ME-DDPs. The results show that in low-dimensional systems where the cost landscapes are relatively simple and local information is sufficiently representative, our framework consistently outperforms MPPIs. In high-dimensional systems, MPPI can occasionally discover aggressive maneuvers that enable it to steer the systems faster than DDP-based methods, whereas our method maintains a higher, more stable success rate. Finally, we validate the practical efficacy of the framework through hardware experiments with a quadrotor navigating a dense, non-convex obstacle field, confirming the robustness of the proposed framework for real-world deployment.

Problem

Research questions and friction points this paper is trying to address.

Non-Convex MPC

Model Predictive Control

Local Minima

Robot Navigation

Non-Convex Optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Optimization

Non-Convex MPC

Maximum Entropy DDP