PRJ: Perception-Retrieval-Judgement for Generated Images

📅 2025-06-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing binary classification systems inadequately detect implicit and explicit safety risks—such as hate symbols, copyright violations, and suggestive violence—in generative AI images, suffering from poor semantic interpretability and inability to dynamically quantify toxicity. To address this, we propose a cognition-inspired three-stage framework: (1) *Perception*, converting multimodal images into structured natural-language descriptions; (2) *Retrieval*, augmenting hazard detection via knowledge-enriched association; and (3) *Judgment*, performing rule-guided linguistic reasoning. We introduce the first language-centric structured reasoning paradigm, enabling context-sensitive toxicity risk matrices that support both category-level interpretability and semantic-dimension quantification. Our method achieves significant improvements in detection accuracy and adversarial robustness across multiple safety benchmarks, and uniquely enables structured toxicity attribution and intensity quantification.

Technology Category

Application Category

📝 Abstract

The rapid progress of generative AI has enabled remarkable creative capabilities, yet it also raises urgent concerns regarding the safety of AI-generated visual content in real-world applications such as content moderation, platform governance, and digital media regulation. This includes unsafe material such as sexually explicit images, violent scenes, hate symbols, propaganda, and unauthorized imitations of copyrighted artworks. Existing image safety systems often rely on rigid category filters and produce binary outputs, lacking the capacity to interpret context or reason about nuanced, adversarially induced forms of harm. In addition, standard evaluation metrics (e.g., attack success rate) fail to capture the semantic severity and dynamic progression of toxicity. To address these limitations, we propose Perception-Retrieval-Judgement (PRJ), a cognitively inspired framework that models toxicity detection as a structured reasoning process. PRJ follows a three-stage design: it first transforms an image into descriptive language (perception), then retrieves external knowledge related to harm categories and traits (retrieval), and finally evaluates toxicity based on legal or normative rules (judgement). This language-centric structure enables the system to detect both explicit and implicit harms with improved interpretability and categorical granularity. In addition, we introduce a dynamic scoring mechanism based on a contextual toxicity risk matrix to quantify harmfulness across different semantic dimensions. Experiments show that PRJ surpasses existing safety checkers in detection accuracy and robustness while uniquely supporting structured category-level toxicity interpretation.

Problem

Research questions and friction points this paper is trying to address.

Detecting unsafe AI-generated images lacking context interpretation

Evaluating semantic severity of toxicity in dynamic scenarios

Improving interpretability in harm detection with structured reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Perception-Retrieval-Judgement framework for toxicity detection

Language-centric structure for explicit and implicit harms

Dynamic scoring mechanism with contextual toxicity risk matrix

🔎 Similar Papers

No similar papers found.

Authors to Follow