🤖 AI Summary
This work addresses the challenge of poor performance in recognizing rare vehicle colors under real-world surveillance conditions, primarily caused by long-tailed data distributions. To mitigate this issue, the study introduces, for the first time, text- and image-conditioned generative models—specifically RunDiffusion/JuggernautXL coupled with Gemini 2.0 Flash—for synthetic data augmentation of underrepresented color classes. The proposed pipeline integrates foreground-aware preprocessing, color-safe augmentation, loss reweighting, adaptive learning rate scheduling, and modern vision backbone architectures. Through an ensemble fusion strategy, the method achieves a micro-averaged accuracy of 94.6% and a macro-averaged accuracy of 79.7% on the UFPR-VeSV dataset, marking a significant improvement of 8.2 percentage points in macro accuracy over existing approaches.
📝 Abstract
Vehicle color recognition is an important cue for vehicle identification in surveillance systems, especially when license plates are illegible due to low resolution, occlusion, motion blur, or poor illumination. However, real-world vehicle color distributions are highly imbalanced, making overall accuracy insufficient to assess performance on rare but operationally relevant colors. This paper presents a comprehensive study of vehicle color recognition under severe class imbalance using UFPR-VeSV, a challenging real-world surveillance dataset. We investigate synthetic minority-class augmentation through two off-the-shelf generative strategies: text-conditioned image generation with RunDiffusion/JuggernautXL and image-conditioned color editing with Gemini 2.0 Flash. The curated synthetic data are combined with modern visual representations, loss reweighting, learning-rate scheduling, color-safe augmentation, foreground-aware preprocessing, and ensemble fusion. The bestperforming approach achieves 94.6% micro accuracy and 79.7% macro accuracy, improving macro accuracy by 8.2 percentage points over recent literature. A manual error analysis further shows that many remaining failures are visually ambiguous even for human annotators, highlighting the practical limits of color-based vehicle identification in unconstrained surveillance imagery. The generated images and source code are publicly available at https://github.com/viniciusorru/vcr-synthetic