Visual Commonsense Driven Knowledge Refinements for Scene Graph Generation

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

Existing scene graph generation models struggle to learn reliable visual commonsense under sparse annotations, leading to significantly degraded performance on rare relationships. This work proposes a model-agnostic, semantics-guided knowledge refinement framework that automatically uncovers spatial, functional, and qualitative relational patterns from training data during inference and dynamically corrects predictions through declarative commonsense reasoning. Requiring neither handcrafted rules nor model retraining, the method is readily transferable across datasets and architectures, and represents the first effective integration of structured visual commonsense reasoning into purely learning-driven scene graph generation pipelines. It consistently outperforms strong baselines across three standard benchmarks, demonstrating the critical role of visual commonsense in enhancing the robustness and accuracy of scene graph generation.

📝 Abstract

Learning-driven Scene Graph Generation (SGG) models excel on frequent relation types but degrade sharply under annotation sparsity, failing to capture reliable visual commonsense knowledge. We propose a model-agnostic, semantically-guided knowledge refinement framework that systematically mines commonsense-grounded constraints from training data - capturing spatial, functional, and qualitative relational regularities - and uses general declarative commonsense reasoning to correct and refine ranked SGG predictions at inference time. The framework requires no manual rule authoring, no model retraining, and transfers across datasets and architectures. On three standard benchmarks, we obtain consistent improvements over strong baselines, demonstrating that structured visual commonsense reasoning over deep scene semantics is a practical and effective complement to purely learning-based scene graph generation.

Problem

Research questions and friction points this paper is trying to address.

Scene Graph Generation

Visual Commonsense

Annotation Sparsity

Relational Regularities

Knowledge Refinement

Innovation

Methods, ideas, or system contributions that make the work stand out.

visual commonsense reasoning

knowledge refinement

scene graph generation