Macro Economists in the Machine: A Multi-Agent LLM Framework for Commodity-Related ETF Portfolio Construction

📅 2026-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates how large language models (LLMs) can enhance commodity ETF portfolio performance under a fixed information set and execution protocol. The authors propose the first multi-agent LLM framework tailored to commodity ETFs, comprising hawkish, dovish, deliberative, and rule-based agents that generate allocation signals from a unified macroeconomic Z-score input and feed them into a rules-based rebalancing engine. Empirical results demonstrate that the approach significantly outperforms passive benchmarks even under high one-way transaction costs of 30 basis points, with the hawkish and deliberative agents improving Sharpe ratios by 0.044 and 0.040, respectively (p<0.10). The outperformance is most pronounced during economic soft-landing regimes, and the deliberation mechanism primarily serves as a bias-correction device, confirming the incremental value of LLMs as constrained macroeconomic interpretation functions.
📝 Abstract
We test whether large language models (LLMs) add value in commodity portfolio construction when the information set and implementation rules are held fixed across strategies. A Hawkish Agent (inflation-tightening prior), a Dovish Agent (growth-easing prior), a Debate Agent, and a deterministic z-score Rule Agent each receive identical FRED macro z-scores and route their tilt signals through the same portfolio engine. Across 124 weekly rebalancing dates spanning the 2023 U.S. rate peak and the 2024-2025 soft landing, all three LLM strategies outperform the Rule Agent in Sharpe terms; the Hawkish and Debate Agents record the largest gains (ΔSharpe = +0.044 and +0.040, both p < 0.10 under a block bootstrap) and preserve a net-of-cost advantage over the passive inverse-volatility benchmark at one-way trading costs up to 30 basis points, while the Rule Agent's thin margin over passive disappears at approximately 5 basis points.The Debate Agent does not outperform the best single agent (ΔSharpe = -0.004, p = 0.769); its contribution is bias correction -- averaging out the Dovish Agent's miscalibrated prior -- rather than deliberation-generated return. The performance advantage is concentrated in the soft-landing sub-period, the evaluation window spans a single rate cycle, and the reported $p$-values are unadjusted for multiple comparisons. Within these limits, the results suggest that an LLM acting as a constrained macro-interpretation function can add modest but economically meaningful value over a transparent rule layer, though the margin is small and its persistence beyond this sample is unknown.
Problem

Research questions and friction points this paper is trying to address.

large language models
commodity portfolio construction
macroeconomic interpretation
ETF
investment strategy
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent LLM
commodity ETF portfolio
macro interpretation
debate agent
Sharpe ratio outperformance
🔎 Similar Papers