A Comprehensive Overview of Deep Learning Models for Object Detection from Videos/Images

📅 2026-01-21
🏛️ International Journal of Artificial Intelligence and Soft Computing
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenges of robustness and real-time performance in object detection for video and image surveillance under dynamic environments, occlusions, and varying illumination. It presents a systematic review of deep learning–based approaches, offering a novel taxonomy along three dimensions: core architectures, data processing strategies, and surveillance-specific challenges. The work critically examines the roles of CNN-based detectors and generative models—particularly GANs—in tasks such as frame reconstruction, occlusion mitigation, and illumination normalization, alongside mechanisms for temporal information fusion. Through a comprehensive evaluation of prevailing models, benchmark datasets, and performance metrics, the paper delineates the current efficacy boundaries of semantic object detection and identifies promising future directions, including low-latency inference, efficient modeling, and joint spatiotemporal learning.

Technology Category

Application Category

📝 Abstract
Object detection in video and image surveillance is a well-established yet rapidly evolving task, strongly influenced by recent deep learning advancements. This review summarises modern techniques by examining architectural innovations, generative model integration, and the use of temporal information to enhance robustness and accuracy. Unlike earlier surveys, it classifies methods based on core architectures, data processing strategies, and surveillance specific challenges such as dynamic environments, occlusions, lighting variations, and real-time requirements. The primary goal is to evaluate the current effectiveness of semantic object detection, while secondary aims include analysing deep learning models and their practical applications. The review covers CNN-based detectors, GAN-assisted approaches, and temporal fusion methods, highlighting how generative models support tasks such as reconstructing missing frames, reducing occlusions, and normalising illumination. It also outlines preprocessing pipelines, feature extraction progress, benchmarking datasets, and comparative evaluations. Finally, emerging trends in low-latency, efficient, and spatiotemporal learning approaches are identified for future research.
Problem

Research questions and friction points this paper is trying to address.

object detection
video surveillance
deep learning
occlusions
real-time requirements
Innovation

Methods, ideas, or system contributions that make the work stand out.

generative models
temporal fusion
occlusion handling
real-time object detection
spatiotemporal learning
🔎 Similar Papers
No similar papers found.
S
Sukana Zulfqar
Department of Computer Science, University of Agriculture Faisalabad, Faisalabad, 38000, Punjab, Pakistan
S
Sadia Saeed
Faculty of Information Technology and Computer Science (FoIT&CS), University of Central Punjab, Lahore, 54783, Punjab, Pakistan
M
M. A. Zia
Department of Computer Science, University of Agriculture Faisalabad, Faisalabad, 38000, Punjab, Pakistan
A
Anjum Ali
Department of Software Engineering, Riphah International University, Faisalabad Campus, Pakistan
Faisal Mehmood
Faisal Mehmood
Gachon University
IoTMachine LearningComputer Vision
Abid Ali
Abid Ali
INRIA Sophia Antipolis
computer visiondeep learningAction recognitionaction localizationgaze estimation