Detect in Any Scene: An Agentic Framework for Object Detection with Experience-Aware Reasoning

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This work addresses the challenge of limited generalization in existing object detectors caused by diverse image degradations and heterogeneous target distributions in real-world scenarios. To this end, we propose DetAS, an agent-based object detection framework that formulates detection as a dynamic decision-making process. DetAS leverages a multimodal large language model to adaptively compose image enhancement modules with specialized detectors and introduces an experience-aware reasoning mechanism, DetAS-X, which enables continuous optimization and scene adaptation of detection strategies through node-level experience accumulation. By integrating adaptive image restoration, multi-expert detector ensembles, and self-evolving experience collection, our method achieves an average F1-score improvement of 28.36% across six benchmarks, with gains up to 37.01% on the DarkFace dataset, significantly outperforming current MLLM-based detection approaches.

📝 Abstract

Object detection in real-world scenarios remains challenging due to diverse image degradations and heterogeneous object distributions, which significantly hinder the generalization of existing detectors. Conventional approaches, including scene-specific representation learning and end-to-end pipeline design, are inherently limited by their reliance on predefined conditions and lack adaptability to dynamic environments. In this paper, we propose DetAS, an agentic detection framework that formulates object detection as a dynamic decision process. Instead of relying on static pipelines, DetAS leverages a Multimodal Large Language Model (MLLM) as a central agent to adaptively compose detection workflows by selecting from a toolbox of restoration modules and specialized detectors. Specifically, DetAS consists of two key components: Self-Adaptive Image Restoration, which dynamically determines whether and how to enhance images for downstream detection, and Multi-Expertise Detection, which integrates multiple domain-specialized detectors and resolves their predictions through instance-level reasoning. To further improve decision quality under fine-grained conditions, we introduce Self-Evolving Experience Harvesting and extend the framework to DetAS-X, which accumulates node-level decision experience from a small set of annotated data and enables experience-aware reasoning during inference. This mechanism allows the system to progressively refine its decision policy and adapt to diverse real-world scenarios. Extensive experiments on six challenging benchmarks demonstrate that DetAS-X significantly outperforms existing MLLM-based detectors, achieving an average improvement of 28.36% in F1 score, with up to 37.01% gain on DarkFace. These results demonstrate the promise of agentic detection and establish a solid foundation for its application in complex and dynamic environments.

Problem

Research questions and friction points this paper is trying to address.

object detection

image degradation

heterogeneous object distribution

dynamic environments

generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic Detection

Multimodal Large Language Model

Self-Adaptive Restoration