Spatial CAPTCHA: Generatively Benchmarking Spatial Reasoning for Human-Machine Differentiation

📅 2025-10-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional CAPTCHAs are increasingly vulnerable due to rapid advances in multimodal large language models’ (MLLMs) spatial reasoning capabilities. Method: We propose Spatial CAPTCHA, the first human–machine verification framework explicitly grounded in fundamental cognitive disparities between humans and MLLMs in spatial reasoning—including geometric modeling, viewpoint transformation, and occlusion reasoning. Our approach employs a constraint-driven procedural generation pipeline, automated correctness verification, and human–AI collaborative validation to construct a scalable, formally verifiable, and dynamically adaptive challenge suite. Contribution/Results: Beyond security, Spatial CAPTCHA serves as a benchmark for evaluating AI spatial cognition. On Spatial-CAPTCHA-Bench, humans achieve significantly higher accuracy than ten state-of-the-art MLLMs (best model: Pass@1 = 31.0%), and our framework outperforms Google reCAPTCHA in adversarial robustness.

Technology Category

Application Category

📝 Abstract
Online services rely on CAPTCHAs as a first line of defense against automated abuse, yet recent advances in multi-modal large language models (MLLMs) have eroded the effectiveness of conventional designs that focus on text recognition or 2D image understanding. To address this challenge, we present Spatial CAPTCHA, a novel human-verification framework that leverages fundamental differences in spatial reasoning between humans and MLLMs. Unlike existing CAPTCHAs which rely on low-level perception tasks that are vulnerable to modern AI, Spatial CAPTCHA generates dynamic questions requiring geometric reasoning, perspective-taking, occlusion handling, and mental rotation. These skills are intuitive for humans but difficult for state-of-the-art (SOTA) AI systems. The system employs a procedural generation pipeline with constraint-based difficulty control, automated correctness verification, and human-in-the-loop validation to ensure scalability, robustness, and adaptability. Evaluation on a corresponding benchmark, Spatial-CAPTCHA-Bench, demonstrates that humans vastly outperform 10 state-of-the-art MLLMs, with the best model achieving only 31.0% Pass@1 accuracy. Furthermore, we compare Spatial CAPTCHA with Google reCAPTCHA, which confirms its effectiveness as both a security mechanism and a diagnostic tool for spatial reasoning in AI.
Problem

Research questions and friction points this paper is trying to address.

Addressing AI vulnerability in conventional CAPTCHAs through spatial reasoning
Developing human-verification framework leveraging geometric and perspective-taking tasks
Creating benchmark requiring mental rotation and occlusion handling capabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates dynamic questions requiring geometric reasoning
Uses procedural generation with constraint-based difficulty control
Leverages human-machine differences in spatial reasoning
🔎 Similar Papers
No similar papers found.
Arina Kharlamova
Arina Kharlamova
Software Developer
Bowei He
Bowei He
City University of Hong Kong, MBZUAI
Data MiningLanguage ModelGenAI4ScienceAgentic AI
C
Chen Ma
Department of Computer Science, City University of Hong Kong, Hong Kong, China
X
Xue Liu
Department of Computer Science, MBZUAI, Abu Dhabi, UAE