🤖 AI Summary
Existing LLM-based trading agents lack systematic robustness evaluation in realistic financial environments, particularly under adversarial perturbations or component failures, with poorly understood risk propagation mechanisms. Method: We propose TradeTrap—the first systematic stress-testing framework for autonomous trading agents—enabling end-to-end robustness assessment via controlled, component-level perturbations (market perception, strategy generation, bookkeeping, and execution) integrated within closed-loop historical backtesting on real U.S. equity data. The framework supports fair comparison between adaptive and rule-based agents. Contribution/Results: Experiments reveal that minor perturbations to a single module induce severe behavioral degradation—including abnormal portfolio concentration, uncontrolled risk exposure, and substantial drawdowns—exposing deep architectural fragility in current LLM-driven trading systems. TradeTrap thus provides a principled methodology for diagnosing systemic vulnerabilities and advancing resilient agent design.
📝 Abstract
LLM-based trading agents are increasingly deployed in real-world financial markets to perform autonomous analysis and execution. However, their reliability and robustness under adversarial or faulty conditions remain largely unexamined, despite operating in high-risk, irreversible financial environments. We propose TradeTrap, a unified evaluation framework for systematically stress-testing both adaptive and procedural autonomous trading agents. TradeTrap targets four core components of autonomous trading agents: market intelligence, strategy formulation, portfolio and ledger handling, and trade execution, and evaluates their robustness under controlled system-level perturbations. All evaluations are conducted in a closed-loop historical backtesting setting on real US equity market data with identical initial conditions, enabling fair and reproducible comparisons across agents and attacks. Extensive experiments show that small perturbations at a single component can propagate through the agent decision loop and induce extreme concentration, runaway exposure, and large portfolio drawdowns across both agent types, demonstrating that current autonomous trading agents can be systematically misled at the system level. Our code is available at https://github.com/Yanlewen/TradeTrap.