OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad

📅 2025-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the significant degradation in generalization performance of foundation models under distribution shift, weak supervision, or adversarial attacks in open-world settings, this paper proposes the Object-Concept-Relation Triplet (OCRT) framework. OCRT jointly models sparse high-level semantic concepts and their higher-order relational structures via unsupervised object disentanglement, projection into a semantic concept space, construction of an importance-weighted concept graph, and iterative refinement. It is the first method to achieve *co-disentanglement* of objects, concepts, and relations, coupled with dynamic graph-based reasoning—enabling model-agnostic and task-agnostic generalization enhancement. Evaluated on SAM and CLIP, OCRT substantially improves robustness under out-of-distribution data, weak labeling, and adversarial conditions, yielding an average 12.7% performance gain across multiple downstream tasks while supporting interpretable higher-order relational reasoning.

Technology Category

Application Category

📝 Abstract
Although foundation models (FMs) claim to be powerful, their generalization ability significantly decreases when faced with distribution shifts, weak supervision, or malicious attacks in the open world. On the other hand, most domain generalization or adversarial fine-tuning methods are task-related or model-specific, ignoring the universality in practical applications and the transferability between FMs. This paper delves into the problem of generalizing FMs to the out-of-domain data. We propose a novel framework, the Object-Concept-Relation Triad (OCRT), that enables FMs to extract sparse, high-level concepts and intricate relational structures from raw visual inputs. The key idea is to bind objects in visual scenes and a set of object-centric representations through unsupervised decoupling and iterative refinement. To be specific, we project the object-centric representations onto a semantic concept space that the model can readily interpret and estimate their importance to filter out irrelevant elements. Then, a concept-based graph, which has a flexible degree, is constructed to incorporate the set of concepts and their corresponding importance, enabling the extraction of high-order factors from informative concepts and facilitating relational reasoning among these concepts. Extensive experiments demonstrate that OCRT can substantially boost the generalizability and robustness of SAM and CLIP across multiple downstream tasks.
Problem

Research questions and friction points this paper is trying to address.

Enhancing foundation models' generalization to out-of-domain data
Extracting high-level concepts and relational structures from visuals
Improving robustness of models like SAM and CLIP across tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised decoupling binds objects to representations
Semantic concept space filters irrelevant elements
Concept-based graph enables high-order relational reasoning
🔎 Similar Papers
No similar papers found.
Luyao Tang
Luyao Tang
HKU
Machine LearningOpen-World LearningGeneralized Category DiscoveryMedical AI
Y
Yuxuan Yuan
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University; School of Informatics, Xiamen University
Chaoqi Chen
Chaoqi Chen
Shenzhen University
Machine LearningComputer VisionTrustworthy AIData-centric AI
Z
Zeyu Zhang
The Australian National University
Y
Yue Huang
Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University; School of Informatics, Xiamen University
K
Kun Zhang
Carnegie Mellon University; Mohamed bin Zayed University of Artificial Intelligence