TRACE: Trajectory Reasoning through Adaptive Cross-Step Evidence Aggregation for LLM Agents

๐Ÿ“… 2026-06-05
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing monitoring approaches struggle to detect malicious intent in long-horizon behavior trajectories of large language model agents, particularly when such intent emerges only through multi-step sequences that appear innocuous in isolation and require distant evidence correlation. To address this challenge, this work proposes TRACE, a novel framework featuring a Triage-Inspect-Judge (TIJ) iterative mechanism. TRACE enables trajectory-level assessment by adaptively segmenting agent behavior, performing context-aware fine-grained inspection, and dynamically aggregating cross-step evidence. Evaluated across ten task domains in the SHADE-Arena benchmark, TRACE achieves a composite F1 score of 0.713 and a recall of 0.844, demonstrating substantial performance gainsโ€”especially in tasks demanding long-range evidential reasoning.
๐Ÿ“ Abstract
Autonomous LLM agents can pursue hidden malicious objectives through sequences of individually benign actions, making sabotage difficult to detect using standard trajectory-level monitoring. Existing approaches either evaluate complete trajectories in a single pass or partition them into independently scored windows, limiting their ability to connect evidence across temporally distant actions. We propose TRACE, a monitoring framework for long-horizon LLM agent trajectories. TRACE operates through a TIJ (Triage-Inspect-Judge) loop that identifies high-signal regions, performs targeted inspection while maintaining accumulated evidence across reasoning steps, and synthesizes a trajectory-level verdict. We evaluate TRACE on ten task domains from SHADE-Arena against state-of-the-art baselines. TRACE achieves an aggregate F1 of 0.713 and recall of 0.844, with the largest gains on tasks requiring long-range evidence linking.
Problem

Research questions and friction points this paper is trying to address.

LLM agents
trajectory monitoring
malicious behavior detection
evidence aggregation
long-horizon reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

trajectory reasoning
evidence aggregation
LLM agents
adaptive monitoring
long-horizon detection