Agent Identity Evals: Measuring Agentic Identity

📅 2025-07-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses fundamental challenges in large language model agents (LMAs)—including weak identity identifiability, poor continuity, low persistence, and behavioral inconsistency—arising from intrinsic LLM properties such as statelessness, stochasticity, and prompt sensitivity. We propose the first systematic *Agent Identity Evaluation* framework, grounded in formal definitions and comprising a quantifiable, reproducible multidimensional metric suite. The framework integrates statistical empirical analysis, state perturbation testing, and capability tracing to enable end-to-end identity measurement across the agent’s lifecycle. Innovatively, it couples identity stability with architectural components (e.g., memory and tool use), establishing an evaluation-optimization closed loop. Experimental results demonstrate that the framework effectively detects identity degradation and significantly enhances LMA reliability and trustworthiness in reasoning, planning, and action execution.

Technology Category

Application Category

📝 Abstract

Central to agentic capability and trustworthiness of language model agents (LMAs) is the extent they maintain stable, reliable, identity over time. However, LMAs inherit pathologies from large language models (LLMs) (statelessness, stochasticity, sensitivity to prompts and linguistically-intermediation) which can undermine their identifiability, continuity, persistence and consistency. This attrition of identity can erode their reliability, trustworthiness and utility by interfering with their agentic capabilities such as reasoning, planning and action. To address these challenges, we introduce extit{agent identity evals} (AIE), a rigorous, statistically-driven, empirical framework for measuring the degree to which an LMA system exhibit and maintain their agentic identity over time, including their capabilities, properties and ability to recover from state perturbations. AIE comprises a set of novel metrics which can integrate with other measures of performance, capability and agentic robustness to assist in the design of optimal LMA infrastructure and scaffolding such as memory and tools. We set out formal definitions and methods that can be applied at each stage of the LMA life-cycle, and worked examples of how to apply them.

Problem

Research questions and friction points this paper is trying to address.

Measuring stability of language model agent identity

Addressing identity attrition in agentic capabilities

Evaluating agentic identity recovery from perturbations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces agent identity evals (AIE) framework

Measures LMA identity stability and recovery

Novel metrics for LMA performance and robustness

🔎 Similar Papers

No similar papers found.

Authors to Follow