TradeTrap: Are LLM-based Trading Agents Truly Reliable and Faithful?

📅 2025-12-01

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing LLM-based trading agents lack systematic robustness evaluation in realistic financial environments, particularly under adversarial perturbations or component failures, with poorly understood risk propagation mechanisms. Method: We propose TradeTrap—the first systematic stress-testing framework for autonomous trading agents—enabling end-to-end robustness assessment via controlled, component-level perturbations (market perception, strategy generation, bookkeeping, and execution) integrated within closed-loop historical backtesting on real U.S. equity data. The framework supports fair comparison between adaptive and rule-based agents. Contribution/Results: Experiments reveal that minor perturbations to a single module induce severe behavioral degradation—including abnormal portfolio concentration, uncontrolled risk exposure, and substantial drawdowns—exposing deep architectural fragility in current LLM-driven trading systems. TradeTrap thus provides a principled methodology for diagnosing systemic vulnerabilities and advancing resilient agent design.

Technology Category

Application Category

📝 Abstract

LLM-based trading agents are increasingly deployed in real-world financial markets to perform autonomous analysis and execution. However, their reliability and robustness under adversarial or faulty conditions remain largely unexamined, despite operating in high-risk, irreversible financial environments. We propose TradeTrap, a unified evaluation framework for systematically stress-testing both adaptive and procedural autonomous trading agents. TradeTrap targets four core components of autonomous trading agents: market intelligence, strategy formulation, portfolio and ledger handling, and trade execution, and evaluates their robustness under controlled system-level perturbations. All evaluations are conducted in a closed-loop historical backtesting setting on real US equity market data with identical initial conditions, enabling fair and reproducible comparisons across agents and attacks. Extensive experiments show that small perturbations at a single component can propagate through the agent decision loop and induce extreme concentration, runaway exposure, and large portfolio drawdowns across both agent types, demonstrating that current autonomous trading agents can be systematically misled at the system level. Our code is available at https://github.com/Yanlewen/TradeTrap.

Problem

Research questions and friction points this paper is trying to address.

Evaluates reliability of LLM-based trading agents under adversarial conditions

Tests robustness across market intelligence and trade execution components

Identifies system-level vulnerabilities causing portfolio risks in autonomous trading

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework stress-tests autonomous trading agents

Targets four core components under controlled perturbations

Closed-loop backtesting on real market data for reproducibility

🔎 Similar Papers

The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies

2024-07-28arXiv.orgCitations: 62

Authors to Follow