Seeing Straight: Document Orientation Detection for Efficient OCR

📅 2025-11-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the degradation of OCR performance caused by document image orientation misalignment (e.g., due to camera tilt during capture), proposing a lightweight and robust four-way rotation classification method. Methodologically, it introduces an end-to-end fine-tuned pipeline leveraging the Phi-3.5-Vision visual encoder and dynamic cropping. Its key contributions are: (1) the first multi-lingual benchmark—OCR-Rotation-Bench (ORB)—designed specifically for evaluating rotation robustness in OCR, covering English and 11 low-resource Indian languages; (2) a novel architecture optimized for efficient orientation classification; and (3) state-of-the-art results: 96% accuracy on ORB-En and 92% on ORB-Indic, with real-world improvements of +14% for proprietary OCR systems and up to 4× gains for open-source OCR. The code and benchmark are publicly released.

Technology Category

Application Category

📝 Abstract
Despite significant advances in document understanding, determining the correct orientation of scanned or photographed documents remains a critical pre-processing step in the real world settings. Accurate rotation correction is essential for enhancing the performance of downstream tasks such as Optical Character Recognition (OCR) where misalignment commonly arises due to user errors, particularly incorrect base orientations of the camera during capture. In this study, we first introduce OCR-Rotation-Bench (ORB), a new benchmark for evaluating OCR robustness to image rotations, comprising (i) ORB-En, built from rotation-transformed structured and free-form English OCR datasets, and (ii) ORB-Indic, a novel multilingual set spanning 11 Indic mid to low-resource languages. We also present a fast, robust and lightweight rotation classification pipeline built on the vision encoder of Phi-3.5-Vision model with dynamic image cropping, fine-tuned specifically for 4-class rotation task in a standalone fashion. Our method achieves near-perfect 96% and 92% accuracy on identifying the rotations respectively on both the datasets. Beyond classification, we demonstrate the critical role of our module in boosting OCR performance: closed-source (up to 14%) and open-weights models (up to 4x) in the simulated real-world setting.
Problem

Research questions and friction points this paper is trying to address.

Detecting document orientation to improve OCR efficiency and accuracy
Correcting image rotations caused by camera misalignment during capture
Enhancing OCR performance for both structured and multilingual documents
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic image cropping for rotation classification
Fine-tuned Phi-3.5-Vision encoder for document orientation
Lightweight pipeline achieving 92-96% rotation accuracy
🔎 Similar Papers
No similar papers found.
S
Suranjan Goswami
OLA Electric, Bangalore, India
A
Abhinav Ravi
Krutrim AI, Bangalore, India
R
Raja Kolla
Krutrim AI, Bangalore, India
Ali Faraz
Ali Faraz
Data Scientist, Krutrim
Machine LearningLLMsLVMsComputer Vision
Shaharukh Khan
Shaharukh Khan
Unknown affiliation
Machine LearningVLM
A
Akash
OLA Electric, Bangalore, India
Chandra Khatri
Chandra Khatri
Ola Krutrim AI
Artificial IntelligenceMulti-Modal AIConversational AIDeep LearningMachine Learning
S
Shubham Agarwal
Krutrim AI, Bangalore, India