PRJ: Perception-Retrieval-Judgement for Generated Images

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing binary classification systems inadequately detect implicit and explicit safety risks—such as hate symbols, copyright violations, and suggestive violence—in generative AI images, suffering from poor semantic interpretability and inability to dynamically quantify toxicity. To address this, we propose a cognition-inspired three-stage framework: (1) *Perception*, converting multimodal images into structured natural-language descriptions; (2) *Retrieval*, augmenting hazard detection via knowledge-enriched association; and (3) *Judgment*, performing rule-guided linguistic reasoning. We introduce the first language-centric structured reasoning paradigm, enabling context-sensitive toxicity risk matrices that support both category-level interpretability and semantic-dimension quantification. Our method achieves significant improvements in detection accuracy and adversarial robustness across multiple safety benchmarks, and uniquely enables structured toxicity attribution and intensity quantification.

Technology Category

Application Category

📝 Abstract
The rapid progress of generative AI has enabled remarkable creative capabilities, yet it also raises urgent concerns regarding the safety of AI-generated visual content in real-world applications such as content moderation, platform governance, and digital media regulation. This includes unsafe material such as sexually explicit images, violent scenes, hate symbols, propaganda, and unauthorized imitations of copyrighted artworks. Existing image safety systems often rely on rigid category filters and produce binary outputs, lacking the capacity to interpret context or reason about nuanced, adversarially induced forms of harm. In addition, standard evaluation metrics (e.g., attack success rate) fail to capture the semantic severity and dynamic progression of toxicity. To address these limitations, we propose Perception-Retrieval-Judgement (PRJ), a cognitively inspired framework that models toxicity detection as a structured reasoning process. PRJ follows a three-stage design: it first transforms an image into descriptive language (perception), then retrieves external knowledge related to harm categories and traits (retrieval), and finally evaluates toxicity based on legal or normative rules (judgement). This language-centric structure enables the system to detect both explicit and implicit harms with improved interpretability and categorical granularity. In addition, we introduce a dynamic scoring mechanism based on a contextual toxicity risk matrix to quantify harmfulness across different semantic dimensions. Experiments show that PRJ surpasses existing safety checkers in detection accuracy and robustness while uniquely supporting structured category-level toxicity interpretation.
Problem

Research questions and friction points this paper is trying to address.

Detecting unsafe AI-generated images lacking context interpretation
Evaluating semantic severity of toxicity in dynamic scenarios
Improving interpretability in harm detection with structured reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Perception-Retrieval-Judgement framework for toxicity detection
Language-centric structure for explicit and implicit harms
Dynamic scoring mechanism with contextual toxicity risk matrix
🔎 Similar Papers
No similar papers found.
Q
Qiang Fu
School of Computer Science and Engineering, Beihang University, Beijing 100191, China; School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China; State Key Laboratory of Complex & Critical Software Environment (SKLCCSE), Beihang University, Beijing 100191, China
Zonglei Jing
Zonglei Jing
Beihang University
Machine LearningReinforcement LearningOptimal Control
Zonghao Ying
Zonghao Ying
SKLCCSE, BUAA
Trustworthy AI
X
Xiaoqian Li
School of Mathematics and Statistics, Taishan University, Tai’an 271000, China