🤖 AI Summary
Automatic identification of intermodal loading units (e.g., shipping containers, semi-trailers) in high-throughput ports remains hindered by low efficiency and poor robustness. Method: This study systematically reviews computer vision–based identification methods from 1989 to 2024, tracing the evolution from classical image processing to deep learning. Through cross-method comparative analysis, it identifies three primary causes of wide accuracy variance (5%–96%): inconsistent terminology, absence of standardized public benchmark datasets, and limited adaptability to dynamic operational scenarios. Contribution/Results: We propose, for the first time, an open, rigorously annotated multi-source, multi-scenario dataset framework. We identify three critical research directions: context-free text recognition, mobile-camera collaborative perception, and scene-text detection. This establishes a reproducible, comparable evaluation paradigm, significantly accelerating the transition of vision-based identification technologies from research to real-world port deployment.
📝 Abstract
The standardisation of Intermodal Loading Units (ILUs), such as containers, semi-trailers and swap bodies, has revolutionised global trade yet their efficient and robust identification remains a critical bottleneck in high-throughput ports and terminals. This paper reviews 63 empirical studies that propose computer vision (CV) based solutions. It covers the last 35 years (1990-2025), tracing the field's evolution from early digital image processing (DIP) and traditional machine learning (ML) to the current dominance of deep learning (DL) techniques. While CV offers cost-effective alternatives for other types of identification techniques, its development is hindered by the lack of publicly available benchmarking datasets. This results in high variance for the reported results such as end-to-end accuracy ranging from 5 % to 96 %. Beyond dataset limitations, this review highlights the emerging challenges especially introduced by the shift from character-based text recognition to scene-text spotting and the integration of mobile cameras (e.g. drones, sensor equipped ground vehicles) for dynamic terminal monitoring. To advance the field, the paper calls for standardised terminology, open-access datasets, shared source code, while outlining future research directions such as contextless text recognition optimised for ISO6346 codes.