๐ค AI Summary
This work addresses the semantic drift and relationship misclassification in scene graph generation caused by polysemous predicates whose meanings vary with context. To resolve this, the authors propose a dynamic, context-aware predicate semantic reorganization mechanism that infers context-conditioned predicate semantics from intra-image relation candidates via prototype feedback and uses these refined semantics to recalibrate relation representations. A global semantic centroid constraint is further introduced to mitigate semantic drift. Crucially, the mechanism enables prototypes to adaptively merge or split based on scene-level evidence, overcoming the limitations of static predicate representations. Evaluated on the SGDet task of VG-150 and GQA-200 benchmarks, the method achieves notable improvements of 1.4 and 2.7 points in F@100, respectively, outperforming current state-of-the-art approaches.
๐ Abstract
In scene graph generation, a central challenge is modeling polysemous predicates whose meanings shift across contexts. Prior approaches address this issue by decomposing predicates into multiple static prototypes or retrieving semantically similar exemplars. However, these strategies keep predicate representations static and cannot reorganize semantics to reflect image-specific evidence, leading to systematic confusions in ambiguous contexts. We propose AlignG, which learns context-conditioned predicate semantics via prototype feedback. AlignG infers context-conditioned predicate semantics from the relation candidates within each image and feeds the adapted semantics back to recalibrate relation representations. The learning objective anchors this adaptation to global semantic centers, preventing semantic drift while still allowing selective reorganization when the scene provides consistent relational cues. Experiments on VG-150 and GQA-200 show consistent improvements over state-of-the-art baselines, with F@100 improvements of +1.4 on VG-150 and +2.7 on GQA-200 under SGDet. We further visualize per-image prototype similarity shifts and observe coherent context-dependent reorganization where prototypes selectively merge or separate predicates according to scene evidence. The code is available at https://github.com/Namgyu97/AlignG-SGG.pytorch.