Agent Identity Evals: Measuring Agentic Identity

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses fundamental challenges in large language model agents (LMAs)—including weak identity identifiability, poor continuity, low persistence, and behavioral inconsistency—arising from intrinsic LLM properties such as statelessness, stochasticity, and prompt sensitivity. We propose the first systematic *Agent Identity Evaluation* framework, grounded in formal definitions and comprising a quantifiable, reproducible multidimensional metric suite. The framework integrates statistical empirical analysis, state perturbation testing, and capability tracing to enable end-to-end identity measurement across the agent’s lifecycle. Innovatively, it couples identity stability with architectural components (e.g., memory and tool use), establishing an evaluation-optimization closed loop. Experimental results demonstrate that the framework effectively detects identity degradation and significantly enhances LMA reliability and trustworthiness in reasoning, planning, and action execution.

Technology Category

Application Category

📝 Abstract
Central to agentic capability and trustworthiness of language model agents (LMAs) is the extent they maintain stable, reliable, identity over time. However, LMAs inherit pathologies from large language models (LLMs) (statelessness, stochasticity, sensitivity to prompts and linguistically-intermediation) which can undermine their identifiability, continuity, persistence and consistency. This attrition of identity can erode their reliability, trustworthiness and utility by interfering with their agentic capabilities such as reasoning, planning and action. To address these challenges, we introduce extit{agent identity evals} (AIE), a rigorous, statistically-driven, empirical framework for measuring the degree to which an LMA system exhibit and maintain their agentic identity over time, including their capabilities, properties and ability to recover from state perturbations. AIE comprises a set of novel metrics which can integrate with other measures of performance, capability and agentic robustness to assist in the design of optimal LMA infrastructure and scaffolding such as memory and tools. We set out formal definitions and methods that can be applied at each stage of the LMA life-cycle, and worked examples of how to apply them.
Problem

Research questions and friction points this paper is trying to address.

Measuring stability of language model agent identity
Addressing identity attrition in agentic capabilities
Evaluating agentic identity recovery from perturbations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces agent identity evals (AIE) framework
Measures LMA identity stability and recovery
Novel metrics for LMA performance and robustness
🔎 Similar Papers
No similar papers found.