Looking Beyond the Known: Towards a Data Discovery Guided Open-World Object Detection

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Open-world object detection (OWOD) faces two core challenges: semantic confusion between known and unknown classes, and catastrophic forgetting—leading to low unknown-class recall and degraded known-class accuracy. To address these, we propose CROWD, the first framework unifying unknown-class discovery and representation learning into a compositional data discovery and learning task. CROWD introduces a submodular conditional gain (SCG) function to select representative unknown instances and employs a collaboratively decoupled compositional learning objective that simultaneously preserves inter-class discriminability while mitigating confusion and forgetting. Evaluated on the M-OWODB and S-OWODB benchmarks, CROWD achieves absolute improvements of +2.83% and +2.05% in mAP for known classes, respectively, and boosts unknown-class recall to 2.4× that of baseline methods. These results significantly advance the practicality and robustness of open-world continual learning.

Technology Category

Application Category

📝 Abstract
Open-World Object Detection (OWOD) enriches traditional object detectors by enabling continual discovery and integration of unknown objects via human guidance. However, existing OWOD approaches frequently suffer from semantic confusion between known and unknown classes, alongside catastrophic forgetting, leading to diminished unknown recall and degraded known-class accuracy. To overcome these challenges, we propose Combinatorial Open-World Detection (CROWD), a unified framework reformulating unknown object discovery and adaptation as an interwoven combinatorial (set-based) data-discovery (CROWD-Discover) and representation learning (CROWD-Learn) task. CROWD-Discover strategically mines unknown instances by maximizing Submodular Conditional Gain (SCG) functions, selecting representative examples distinctly dissimilar from known objects. Subsequently, CROWD-Learn employs novel combinatorial objectives that jointly disentangle known and unknown representations while maintaining discriminative coherence among known classes, thus mitigating confusion and forgetting. Extensive evaluations on OWOD benchmarks illustrate that CROWD achieves improvements of 2.83% and 2.05% in known-class accuracy on M-OWODB and S-OWODB, respectively, and nearly 2.4x unknown recall compared to leading baselines.
Problem

Research questions and friction points this paper is trying to address.

Addresses semantic confusion between known and unknown object classes
Mitigates catastrophic forgetting in continual object discovery
Improves unknown object recall while maintaining known-class accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses combinatorial framework for object discovery and learning
Mines unknown instances via submodular conditional gain functions
Employs combinatorial objectives to disentangle known and unknown representations
🔎 Similar Papers
No similar papers found.
Anay Majee
Anay Majee
The University of Texas at Dallas, Microsoft, Intel
Submodular FunctionsFew-shot learningRepresentation Learningcomputer vision
A
Amitesh Gangrade
The University of Texas at Dallas
R
Rishabh Iyer
The University of Texas at Dallas