🤖 AI Summary
Current AI agents—particularly large language models (LLMs)—model only the statistical average of human understanding, lacking fine-grained characterization of individualized, context-sensitive comprehension. This limitation results in weak controllability in human–AI interaction and heavy reliance on labor-intensive prompt engineering. Method: We propose the first domain-agnostic, cognition-driven interpretability metric framework, grounded in cognitive psychology principles. It establishes a unified axiomatic system and defines computable, cross-domain dimensions for assessing understanding—without dependence on specific models or training data. Contribution/Results: The framework enables dynamic, fine-grained, human-centered guidance and overcomes intrinsic limitations of LLMs. It provides general design principles and benchmarking tools for next-generation intelligent agents that are both explainable and steerable, advancing the foundation for cognitively aligned AI systems.
📝 Abstract
Successful agent-human partnerships require that any agent generated information is understandable to the human, and that the human can easily steer the agent towards a goal. Such effective communication requires the agent to develop a finer-level notion of what is understandable to the human. State-of-the-art agents, including LLMs, lack this detailed notion of understandability because they only capture average human sensibilities from the training data, and therefore afford limited steerability (e.g., requiring non-trivial prompt engineering). In this paper, instead of only relying on data, we argue for developing generalizable, domain-agnostic measures of understandability that can be used as directives for these agents. Existing research on understandability measures is fragmented, we survey various such efforts across domains, and lay a cognitive-science-rooted groundwork for more coherent and domain-agnostic research investigations in future.