GIFT: LLM-Guided State-Reward Interface for Financial Reinforcement Learning

📅 2026-06-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of insufficient learning signals from raw states and short-term returns in non-stationary financial markets, which hinders the generalization of reinforcement learning strategies. The authors propose the GIFT framework, which builds upon the PPO algorithm and uniquely leverages large language models (LLMs) not for direct decision-making but to design the state–reward interface. Specifically, LLMs guide the generation of enriched state representations through financial factors and construct auxiliary rewards based on risk-aware rules, while policy diagnostic feedback enables iterative refinement. Integrating financial factor analysis, risk modeling, and rolling backtesting, GIFT significantly improves out-of-sample risk-adjusted returns across diverse markets and portfolio settings, outperforming existing baselines by effectively balancing knowledge injection with generative constraints.

📝 Abstract

Financial portfolio trading is naturally formulated as a reinforcement learning problem, where an agent sequentially rebalances assets under changing market conditions to balance return, risk, and transaction costs. Yet in non-stationary markets, raw OHLCV states and short-horizon return rewards often provide an under-specified learning interface, motivating large language models as a way to inject financial knowledge into state and reward design while constraining open-ended generation. To this end, we propose GIFT, an LLM-guided framework for state-reward interface design in PPO-based financial reinforcement learning. Rather than using the LLM to make trading decisions, GIFT uses Factor-guided State Enhancement to generate state features from financial-factor primitives, Risk-rule-guided Reward Shaping to generate auxiliary rewards from portfolio-risk rules, and Diagnostic-guided Refinement to revise candidate interfaces using PPO rollout diagnostics. After refinement, GIFT fixes the selected state-reward interface before evaluation, with no further LLM queries or interface updates at test time. Comprehensive rolling-window experiments across diverse market regimes and portfolio scenarios demonstrate that GIFT improves learning-signal quality and out-of-sample risk-adjusted portfolio performance over baselines. Code and data are available at: https://github.com/KAG778/GIFT .

Problem

Research questions and friction points this paper is trying to address.

financial reinforcement learning

state-reward interface

non-stationary markets

portfolio trading

learning signal quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-guided interface

state-reward design

financial reinforcement learning