🤖 AI Summary
Current visual CAPTCHAs face dual security threats: conventional automated attacks remain effective, while state-of-the-art vision foundation models now outperform humans on most perceptual tasks—undermining CAPTCHA’s fundamental design premise. To address this, we propose NGCaptcha, the first framework unifying *computational friction* with *perceptually robust visual challenges*. It imposes lightweight, hash-based proof-of-work (PoW) directly in the browser to throttle high-volume requests, and introduces a human-centered semantic image selection task leveraging adversarial perceptual features, validated via a low-latency Web protocol. This co-design overcomes single-axis security limitations and achieves, for the first time, robust resistance against mainstream vision foundation models. Experiments across multiple benchmarks show machine break rates reduced to <0.1%—a three-order-of-magnitude improvement over the best prior baseline—while maintaining >99.7% human success rate and sub-1.2-second average verification latency.
📝 Abstract
CAPTCHAs are widely employed for distinguishing humans from automated bots online. However, current vision based CAPTCHAs face escalating security risks: traditional attacks continue to bypass many deployed CAPTCHA schemes, and recent breakthroughs in AI, particularly large scale vision models, enable machine solvers to significantly outperform humans on many CAPTCHA tasks, undermining their original design assumptions. To address these issues, we introduce NGCAPTCHA, a Next Generation CAPTCHA framework that integrates a lightweight client side proof of work (PoW) mechanism with an AI resistant visual recognition challenge. In NGCAPTCHA, a browser must first complete a small hash based PoW before any challenge is displayed, throttling large scale automated attempts by increasing their computational cost. Once the PoW is solved, the user is presented with a human friendly yet model resistant image selection task that exploits perceptual cues current vision systems still struggle with. This hybrid design combines computational friction with AI robust visual discrimination, substantially raising the barrier for automated bots while keeping the verification process fast and effortless for legitimate users.