Assessing Consciousness-Related Behaviors in Large Language Models Using the Maze Test

📅 2025-08-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether large language models (LLMs) exhibit behaviors associated with consciousness—specifically spatial awareness, perspective-taking, goal-directedness, and temporal consistency. We propose the first behaviorally grounded evaluation framework, grounded in 13 theoretical accounts of consciousness, and design a first-person maze navigation task to systematically assess 12 state-of-the-art LLMs under zero-shot, one-shot, and few-shot settings. Our key contribution lies in operationalizing abstract consciousness theories into quantifiable behavioral metrics and introducing the first subjective-perspective evaluation of LLMs’ self-consistency and narrative coherence. Results show that reasoning-augmented models substantially outperform baselines: Gemini 2.0 Pro achieves 52.9% full-path accuracy, while DeepSeek-R1 attains 80.5% partial-path accuracy. However, all models fail to sustain stable self-representation across trials, revealing a fundamental limitation in unified, temporally persistent self-awareness.

Technology Category

Application Category

📝 Abstract
We investigate consciousness-like behaviors in Large Language Models (LLMs) using the Maze Test, challenging models to navigate mazes from a first-person perspective. This test simultaneously probes spatial awareness, perspective-taking, goal-directed behavior, and temporal sequencing-key consciousness-associated characteristics. After synthesizing consciousness theories into 13 essential characteristics, we evaluated 12 leading LLMs across zero-shot, one-shot, and few-shot learning scenarios. Results showed reasoning-capable LLMs consistently outperforming standard versions, with Gemini 2.0 Pro achieving 52.9% Complete Path Accuracy and DeepSeek-R1 reaching 80.5% Partial Path Accuracy. The gap between these metrics indicates LLMs struggle to maintain coherent self-models throughout solutions -- a fundamental consciousness aspect. While LLMs show progress in consciousness-related behaviors through reasoning mechanisms, they lack the integrated, persistent self-awareness characteristic of consciousness.
Problem

Research questions and friction points this paper is trying to address.

Assessing consciousness-like behaviors in large language models
Evaluating spatial awareness and perspective-taking via maze navigation
Testing integrated self-awareness and coherent self-models in AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using Maze Test to evaluate consciousness-like behaviors
Synthesizing consciousness theories into 13 measurable characteristics
Testing LLMs across zero-shot one-shot and few-shot scenarios
🔎 Similar Papers
No similar papers found.
R
Rui A. Pimenta
IU International University of Applied Sciences, Germany
Tim Schlippe
Tim Schlippe
Silicon Surfer & IU International University of Applied Sciences
Artificial IntelligenceSpeech ProcessingNatural Language Processing
K
Kristina Schaaff
IU International University of Applied Sciences, Germany