GIFT: LLM-Guided State-Reward Interface for Financial Reinforcement Learning

πŸ“… 2026-06-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of insufficient learning signals from raw states and short-term returns in non-stationary financial markets, which hinders the generalization of reinforcement learning strategies. The authors propose the GIFT framework, which builds upon the PPO algorithm and uniquely leverages large language models (LLMs) not for direct decision-making but to design the state–reward interface. Specifically, LLMs guide the generation of enriched state representations through financial factors and construct auxiliary rewards based on risk-aware rules, while policy diagnostic feedback enables iterative refinement. Integrating financial factor analysis, risk modeling, and rolling backtesting, GIFT significantly improves out-of-sample risk-adjusted returns across diverse markets and portfolio settings, outperforming existing baselines by effectively balancing knowledge injection with generative constraints.
πŸ“ Abstract
Financial portfolio trading is naturally formulated as a reinforcement learning problem, where an agent sequentially rebalances assets under changing market conditions to balance return, risk, and transaction costs. Yet in non-stationary markets, raw OHLCV states and short-horizon return rewards often provide an under-specified learning interface, motivating large language models as a way to inject financial knowledge into state and reward design while constraining open-ended generation. To this end, we propose GIFT, an LLM-guided framework for state-reward interface design in PPO-based financial reinforcement learning. Rather than using the LLM to make trading decisions, GIFT uses Factor-guided State Enhancement to generate state features from financial-factor primitives, Risk-rule-guided Reward Shaping to generate auxiliary rewards from portfolio-risk rules, and Diagnostic-guided Refinement to revise candidate interfaces using PPO rollout diagnostics. After refinement, GIFT fixes the selected state-reward interface before evaluation, with no further LLM queries or interface updates at test time. Comprehensive rolling-window experiments across diverse market regimes and portfolio scenarios demonstrate that GIFT improves learning-signal quality and out-of-sample risk-adjusted portfolio performance over baselines. Code and data are available at: https://github.com/KAG778/GIFT .
Problem

Research questions and friction points this paper is trying to address.

financial reinforcement learning
state-reward interface
non-stationary markets
portfolio trading
learning signal quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-guided interface
state-reward design
financial reinforcement learning
reward shaping
factor-enhanced states
πŸ”Ž Similar Papers
No similar papers found.
Y
Yanyan Wu
East China University of Science and Technology
B
Boyi Zhang
University of Science and Technology of China
Y
Yanlin Liu
Southwestern University of Finance and Economics
X
Xinyu Fang
University of Science and Technology of China
J
Jining Luan
University of Science and Technology of China
M
Meiqi Zhang
University of Sydney
J
Jiacheng Liu
University of Science and Technology of China
H
Hao Zeng
City University of Hong Kong
D
Dexu Yu
Northeastern University
Chang Liu
Chang Liu
The University of Hong Kong
Natural Language ProcessingLarge Language Models3D Point CloudMultimodal Neural Networks
Hanwen Du
Hanwen Du
The Ohio State University
Machine Learning
Yongxin Ni
Yongxin Ni
National University of Singapore
Recommender Systems
Youhua Li
Youhua Li
City University of Hong Kong
LLMInformation SystemsData Mining