π€ AI Summary
This study investigates whether large language models (LLMs) employ human-like cognitive strategies in semantic memory retrieval, with a focus on memory search mechanisms during semantic fluency tasks. Leveraging mechanistic interpretability methods, the work integrates cross-layer behavioral pattern tracking with semantic fluency analysis to identify, for the first time in LLMs, convergent and divergent memory search behaviors that align with human cognition. The findings reveal strategic similarities between model and human semantic foraging, suggesting that the degree of cognitive alignment can be modulated to enhance humanβAI collaboration. This insight opens new avenues for designing artificial intelligence systems that are either cognitively aligned with or complementary to human memory processes.
π Abstract
Both humans and Large Language Models (LLMs) store a vast repository of semantic memories. In humans, efficient and strategic access to this memory store is a critical foundation for a variety of cognitive functions. Such access has long been a focus of psychology and the computational mechanisms behind it are now well characterized. Much of this understanding has been gleaned from a widely-used neuropsychological and cognitive science assessment called the Semantic Fluency Task (SFT), which requires the generation of as many semantically constrained concepts as possible. Our goal is to apply mechanistic interpretability techniques to bring greater rigor to the study of semantic memory foraging in LLMs. To this end, we present preliminary results examining SFT as a case study. A central focus is on convergent and divergent patterns of generative memory search, which in humans play complementary strategic roles in efficient memory foraging. We show that these same behavioral signatures, critical to human performance on the SFT, also emerge as identifiable patterns in LLMs across distinct layers. Potentially, this analysis provides new insights into how LLMs may be adapted into closer cognitive alignment with humans, or alternatively, guided toward productive cognitive \emph{disalignment} to enhance complementary strengths in human-AI interaction.