🤖 AI Summary
Existing two-stage scene graph generation (SGG) frameworks adopt a causal chained training paradigm, inducing spurious correlations between detector inputs and final predictions. This leads to two systematic biases: (i) tail relations being misclassified as head relations, and (ii) foreground relations being erroneously labeled as background—a long-overlooked issue. To address this, we propose Reverse-causal SGG (RcSGG), the first SGG framework that explicitly models classifier inputs as confounders. RcSGG employs Active Reverse Estimation (ARE) and Maximum Information Sampling (MIS) to disentangle non-causal dependencies. As the inaugural reverse-causal modeling paradigm for SGG, it systematically identifies and mitigates foreground–background confusion bias. Evaluated on standard benchmarks, RcSGG achieves state-of-the-art mean recall, significantly reducing misclassification errors while improving generalization and fairness across relation categories.
📝 Abstract
Existing two-stage Scene Graph Generation (SGG) frameworks typically incorporate a detector to extract relationship features and a classifier to categorize these relationships; therefore, the training paradigm follows a causal chain structure, where the detector's inputs determine the classifier's inputs, which in turn influence the final predictions. However, such a causal chain structure can yield spurious correlations between the detector's inputs and the final predictions, i.e., the prediction of a certain relationship may be influenced by other relationships. This influence can induce at least two observable biases: tail relationships are predicted as head ones, and foreground relationships are predicted as background ones; notably, the latter bias is seldom discussed in the literature. To address this issue, we propose reconstructing the causal chain structure into a reverse causal structure, wherein the classifier's inputs are treated as the confounder, and both the detector's inputs and the final predictions are viewed as causal variables. Specifically, we term the reconstructed causal paradigm as the Reverse causal Framework for SGG (RcSGG). RcSGG initially employs the proposed Active Reverse Estimation (ARE) to intervene on the confounder to estimate the reverse causality, i.e., the causality from final predictions to the classifier's inputs. Then, the Maximum Information Sampling (MIS) is suggested to enhance the reverse causality estimation further by considering the relationship information. Theoretically, RcSGG can mitigate the spurious correlations inherent in the SGG framework, subsequently eliminating the induced biases. Comprehensive experiments on popular benchmarks and diverse SGG frameworks show the state-of-the-art mean recall rate.