PRISM: Progressive Reasoning through Iterative Slot Memory for Vision

📅 2026-05-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

210K/year
🤖 AI Summary
Existing vision models struggle to effectively process incomplete observations and recover missing information within a single feedforward pass. This work proposes a pyramid visual architecture that establishes an iterative “organize–recall–refine” inference mechanism through object-centric slot representations, a learnable memory bank, and a multi-scale recurrent refinement module, enabling structured, human-like, and progressive visual understanding. The proposed method achieves strong performance across image classification, object detection, and semantic segmentation tasks, demonstrating particularly robust behavior under challenging conditions such as occlusion and other forms of incomplete input.
📝 Abstract
Modern vision models process images in a single feed-forward pass, which limits their ability to recover missing evidence or refine uncertain representations under incomplete observations. Inspired by the iterative nature of human perception, we introduce PRISM (Progressive Reasoning through Iterative Slot Memory), a pyramid vision architecture that reasons over images through iterative refinement. At a high level, PRISM groups visual features into object-centric representations, retrieves relevant patterns from a learned memory, and iteratively refines the representation to resolve ambiguity and recover missing information. This organize-recall-refine process operates recurrently across multiple scales, enabling progressive improvement of visual representations. Across standard vision tasks, including image classification, object detection, and semantic segmentation, PRISM achieves competitive performance while demonstrating improved robustness under incomplete observations such as occlusion. These results suggest that iterative reasoning with structured representations and memory is a promising direction for building more resilient and adaptive vision models. Source code and models will be released.
Problem

Research questions and friction points this paper is trying to address.

incomplete observations
visual representation
occlusion
robustness
iterative reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

iterative reasoning
slot memory
object-centric representation
pyramid vision architecture
robustness to occlusion