Visual Large Language Models Exhibit Human-Level Cognitive Flexibility in the Wisconsin Card Sorting Test

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cognitive flexibility—the ability to adapt behavior to changing task rules—remains a critical yet underexplored capability in vision-language large models (VLLMs). Method: This study conducts the first systematic evaluation of GPT-4o, Gemini-1.5 Pro, and Claude-3.5 Sonnet on the Wisconsin Card Sorting Test (WCST), employing chain-of-thought (CoT) prompting, multimodal input variations, and role-playing simulations of frontal lobe lesion–induced error patterns. Contribution/Results: Under text-only + CoT conditions, VLLMs achieve category-switching accuracy comparable to neurotypical adults. Input modality and prompting strategy significantly modulate performance. Crucially, all models robustly reproduce clinically characteristic perseverative errors—repeated adherence to outdated rules—suggesting emergent, brain-inspired cognitive architectures. These findings demonstrate that VLLMs possess human-level abstract rule-shifting capacity and, for the first time, establish their viability as computational models of neurocognitive function, opening a new paradigm for AI-driven cognitive neuroscience.

Technology Category

Application Category

📝 Abstract
Cognitive flexibility has been extensively studied in human cognition but remains relatively unexplored in the context of Visual Large Language Models (VLLMs). This study assesses the cognitive flexibility of state-of-the-art VLLMs (GPT-4o, Gemini-1.5 Pro, and Claude-3.5 Sonnet) using the Wisconsin Card Sorting Test (WCST), a classic measure of set-shifting ability. Our results reveal that VLLMs achieve or surpass human-level set-shifting capabilities under chain-of-thought prompting with text-based inputs. However, their abilities are highly influenced by both input modality and prompting strategy. In addition, we find that through role-playing, VLLMs can simulate various functional deficits aligned with patients having impairments in cognitive flexibility, suggesting that VLLMs may possess a cognitive architecture, at least regarding the ability of set-shifting, similar to the brain. This study reveals the fact that VLLMs have already approached the human level on a key component underlying our higher cognition, and highlights the potential to use them to emulate complex brain processes.
Problem

Research questions and friction points this paper is trying to address.

Assessing cognitive flexibility in VLLMs using WCST
Exploring input modality and prompting effects on VLLMs
Simulating human cognitive deficits via VLLM role-playing
Innovation

Methods, ideas, or system contributions that make the work stand out.

VLLMs tested with Wisconsin Card Sorting
Chain-of-thought prompting enhances performance
Role-playing simulates cognitive deficits
Guangfu Hao
Guangfu Hao
Laboratory of Brain Atlas and Brain-inspired Intelligence, Institute of Automation, CAS
Computational NeuroscienceBrain-Inspired Neural NetworksLarge Language ModelsCognitive Models
Frederic Alexandre
Frederic Alexandre
Inria
Computational NeuroscienceCognitive NeuroscienceMachine LearningArtificial Intelligence
S
Shan Yu
Laboratory of Brain Atlas and Brain-inspired Intelligence, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China