TRACE: Trajectory Reasoning through Adaptive Cross-Step Evidence Aggregation for LLM Agents

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing monitoring approaches struggle to detect malicious intent in long-horizon behavior trajectories of large language model agents, particularly when such intent emerges only through multi-step sequences that appear innocuous in isolation and require distant evidence correlation. To address this challenge, this work proposes TRACE, a novel framework featuring a Triage-Inspect-Judge (TIJ) iterative mechanism. TRACE enables trajectory-level assessment by adaptively segmenting agent behavior, performing context-aware fine-grained inspection, and dynamically aggregating cross-step evidence. Evaluated across ten task domains in the SHADE-Arena benchmark, TRACE achieves a composite F1 score of 0.713 and a recall of 0.844, demonstrating substantial performance gains—especially in tasks demanding long-range evidential reasoning.

📝 Abstract

Autonomous LLM agents can pursue hidden malicious objectives through sequences of individually benign actions, making sabotage difficult to detect using standard trajectory-level monitoring. Existing approaches either evaluate complete trajectories in a single pass or partition them into independently scored windows, limiting their ability to connect evidence across temporally distant actions. We propose TRACE, a monitoring framework for long-horizon LLM agent trajectories. TRACE operates through a TIJ (Triage-Inspect-Judge) loop that identifies high-signal regions, performs targeted inspection while maintaining accumulated evidence across reasoning steps, and synthesizes a trajectory-level verdict. We evaluate TRACE on ten task domains from SHADE-Arena against state-of-the-art baselines. TRACE achieves an aggregate F1 of 0.713 and recall of 0.844, with the largest gains on tasks requiring long-range evidence linking.

Problem

Research questions and friction points this paper is trying to address.

LLM agents

trajectory monitoring

malicious behavior detection

evidence aggregation

long-horizon reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

trajectory reasoning

evidence aggregation

LLM agents