Learning an Efficient Optimizer via Hybrid-Policy Sub-Trajectory Balance

📅 2025-11-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing generative neural network weight-generation methods suffer from two key bottlenecks: (1) excessive coupling between weight generation and task-specific objectives, undermining optimizer generalizability; and (2) absence of local constraints, leading to severe long-range dependencies that degrade both inference efficiency and accuracy. To address these, we propose Lo-Hp, a novel decoupled two-stage weight-generation framework. In Stage I, it learns task-agnostic local optimization policies; in Stage II, it generates weights via sub-trajectory-mixed objectives integrating on-policy and off-policy reinforcement learning. We theoretically establish that local policy learning mitigates long-range dependencies and facilitates convergence toward global optima. Extensive experiments demonstrate that Lo-Hp significantly improves both accuracy and inference speed across diverse scenarios—including transfer learning, few-shot learning, domain generalization, and large-model adaptation—thereby advancing the state of weight-generation methodologies.

Technology Category

Application Category

📝 Abstract
Recent advances in generative modeling enable neural networks to generate weights without relying on gradient-based optimization. However, current methods are limited by issues of over-coupling and long-horizon. The former tightly binds weight generation with task-specific objectives, thereby limiting the flexibility of the learned optimizer. The latter leads to inefficiency and low accuracy during inference, caused by the lack of local constraints. In this paper, we propose Lo-Hp, a decoupled two-stage weight generation framework that enhances flexibility through learning various optimization policies. It adopts a hybrid-policy sub-trajectory balance objective, which integrates on-policy and off-policy learning to capture local optimization policies. Theoretically, we demonstrate that learning solely local optimization policies can address the long-horizon issue while enhancing the generation of global optimal weights. In addition, we validate Lo-Hp's superior accuracy and inference efficiency in tasks that require frequent weight updates, such as transfer learning, few-shot learning, domain generalization, and large language model adaptation.
Problem

Research questions and friction points this paper is trying to address.

Addresses over-coupling in neural weight generation methods
Solves long-horizon optimization inefficiency and accuracy issues
Enhances weight generation for transfer and few-shot learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupled two-stage weight generation framework
Hybrid-policy sub-trajectory balance objective
Learning local optimization policies for efficiency
🔎 Similar Papers
No similar papers found.
Y
Yunchuan Guan
Huazhong University of Science and Technology
Y
Yu Liu
Huazhong University of Science and Technology
K
Ke Zhou
Huazhong University of Science and Technology
H
Hui Li
Jinan Inspur Data Technology Co.
S
Sen Jia
VitaSight
Zhiqi Shen
Zhiqi Shen
Nanyang Technological University
Goal ModelingSoftware AgentsIntelligent AgentsHealth GamesEducational Games
Z
Ziyang Wang
University of Oxford
X
Xinglin Zhang
Shanghai Medical Image Insights Intelligent Technology Co.
T
Tao Chen
University of Waterloo
J
Jenq-Neng Hwang
University of Washington
L
Lei Li
VitaSight