π€ AI Summary
Deep neural networks (DNNs) exhibit substantial deviations from human visual perception in illusory contour detection (e.g., abutting gratings). To address this cognitive misalignment, we propose ICPNetβthe first model to explicitly encode human visual shape priors as differentiable shape constraints within a multi-scale feedforward-feedback architecture. Methodologically, ICPNet integrates Multi-scale Feature Projection (MFP), a Feature Interaction Attention Module (FIAM), and an Edge Fusion Module (EFM) to achieve contour-sensitive feature enhancement and cross-scale consistency optimization. Evaluated on AG-MNIST and AG-Fashion-MNIST benchmarks, ICPNet achieves significant Top-1 accuracy improvements over state-of-the-art models, with a 12.3% gain in illusory contour recognition accuracy. Extensive experiments further validate its superior generalization and improved alignment with human perceptual judgments. This work establishes a novel paradigm for developing cognitively aligned vision models grounded in neuroscientific principles of shape perception.
π Abstract
Higher levels of machine intelligence demand alignment with human perception and cognition. Deep neural networks (DNN) dominated machine intelligence have demonstrated exceptional performance across various real-world tasks. Nevertheless, recent evidence suggests that DNNs fail to perceive illusory contours like the abutting grating, a discrepancy that misaligns with human perception patterns. Departing from previous works, we propose a novel deep network called illusory contour perception network (ICPNet) inspired by the circuits of the visual cortex. In ICPNet, a multi-scale feature projection (MFP) module is designed to extract multi-scale representations. To boost the interaction between feedforward and feedback features, a feature interaction attention module (FIAM) is introduced. Moreover, drawing inspiration from the shape bias observed in human perception, an edge detection task conducted via the edge fusion module (EFM) injects shape constraints that guide the network to concentrate on the foreground. We assess our method on the existing AG-MNIST test set and the AG-Fashion-MNIST test sets constructed by this work. Comprehensive experimental results reveal that ICPNet is significantly more sensitive to abutting grating illusory contours than state-of-the-art models, with notable improvements in top-1 accuracy across various subsets. This work is expected to make a step towards human-level intelligence for DNN-based models.