Exploring Light-Weight Object Recognition for Real-Time Document Detection

📅 2025-09-07

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Real-time document detection and rectification face a trade-off between model lightweightness and OCR readability. Method: This paper proposes an OCR-quality-oriented, end-to-end lightweight detection framework built upon an improved IWPOD-Net architecture. It integrates the NBID and MIDV synthetic datasets, incorporates cross-dataset validation, aggressive data augmentation, and direct corner-point regression. Crucially, it replaces conventional localization accuracy with an OCR-feedback-driven evaluation metric. Results: The model achieves state-of-the-art OCR accuracy on identity cards and passports while maintaining small size and fast inference. Its key contribution lies in empirically validating that “moderate rectification suffices for OCR,” thereby significantly reducing reliance on pixel-level geometric precision—achieving both computational efficiency and practical utility.

Technology Category

Application Category

📝 Abstract

Object Recognition and Document Skew Estimation have come a long way in terms of performance and efficiency. New models follow one of two directions: improving performance using larger models, and improving efficiency using smaller models. However, real-time document detection and rectification is a niche that is largely unexplored by the literature, yet it remains a vital step for automatic information retrieval from visual documents. In this work, we strive towards an efficient document detection pipeline that is satisfactory in terms of Optical Character Recognition (OCR) retrieval and faster than other available solutions. We adapt IWPOD-Net, a license plate detection network, and train it for detection on NBID, a synthetic ID card dataset. We experiment with data augmentation and cross-dataset validation with MIDV (another synthetic ID and passport document dataset) to find the optimal scenario for the model. Other methods from both the Object Recognition and Skew Estimation state-of-the-art are evaluated for comparison with our approach. We use each method to detect and rectify the document, which is then read by an OCR system. The OCR output is then evaluated using a novel OCR quality metric based on the Levenshtein distance. Since the end goal is to improve automatic information retrieval, we use the overall OCR quality as a performance metric. We observe that with a promising model, document rectification does not have to be perfect to attain state-of-the-art performance scores. We show that our model is smaller and more efficient than current state-of-the-art solutions while retaining a competitive OCR quality metric. All code is available at https://github.com/BOVIFOCR/iwpod-doc-corners.git

Problem

Research questions and friction points this paper is trying to address.

Developing efficient real-time document detection and rectification

Optimizing OCR retrieval performance with lightweight models

Enhancing automatic information retrieval from visual documents

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapted IWPOD-Net for document detection

Used synthetic datasets and cross-validation

Evaluated with novel OCR quality metric

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs