FlowDet: Unifying Object Detection and Generative Transport Flows

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key limitations of diffusion-based object detection—including excessive inference steps, fixed proposal counts, and the need for retraining—by reformulating detection as a **conditional flow matching (CFM)-driven generative transport problem**. Methodologically, it is the first to unify object detection within a generalized generative transport framework, enabling end-to-end generation of bounding box coordinates while integrating multi-scale features; CFM is employed to learn compact, straight conditional transport flows, replacing stochastic diffusion paths. The core contribution is a model that supports **dynamic proposal numbers and variable inference steps without retraining**. Experiments demonstrate consistent improvements: +3.6% AP on COCO and +4.2% AP$_{rare}$ on LVIS over DiffusionDet, with particularly strong gains in low-recall regimes. These results validate that transport-flow modeling enhances both detection efficiency and flexibility.

Technology Category

Application Category

📝 Abstract
We present FlowDet, the first formulation of object detection using modern Conditional Flow Matching techniques. This work follows from DiffusionDet, which originally framed detection as a generative denoising problem in the bounding box space via diffusion. We revisit and generalise this formulation to a broader class of generative transport problems, while maintaining the ability to vary the number of boxes and inference steps without re-training. In contrast to the curved stochastic transport paths induced by diffusion, FlowDet learns simpler and straighter paths resulting in faster scaling of detection performance as the number of inference steps grows. We find that this reformulation enables us to outperform diffusion based detection systems (as well as non-generative baselines) across a wide range of experiments, including various precision/recall operating points using multiple feature backbones and datasets. In particular, when evaluating under recall-constrained settings, we can highlight the effects of the generative transport without over-compensating with large numbers of proposals. This provides gains of up to +3.6% AP and +4.2% AP$_{rare}$ over DiffusionDet on the COCO and LVIS datasets, respectively.
Problem

Research questions and friction points this paper is trying to address.

Reformulating object detection as a generative transport problem using Conditional Flow Matching.
Enabling variable box counts and inference steps without retraining for flexible detection.
Improving detection performance over diffusion-based methods with faster, straighter transport paths.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unifies object detection with Conditional Flow Matching
Learns straighter generative transport paths than diffusion
Varies box count and inference steps without retraining
🔎 Similar Papers
No similar papers found.