🤖 AI Summary
This study systematically evaluates AWS Textract’s performance in extracting structured fields—particularly the total amount—from real-world receipts spanning diverse formats, quality levels, and degradation conditions (e.g., blur, skew, occlusion), revealing critical failures in layout understanding and robustness. Method: We propose the first fine-grained, receipt-specific diagnostic framework that jointly models image quality and layout features to enable interpretable failure attribution. Based on empirical analysis, we design an end-to-end optimization pipeline integrating targeted preprocessing (e.g., skew correction, contrast enhancement) and a rule-based post-processing engine. Contribution/Results: Our framework establishes a reproducible OCR diagnostic paradigm for receipt processing. Experiments show 98.2% recall for total-amount extraction, yet performance degrades significantly under low-quality imaging—highlighting key failure modes. The proposed pipeline delivers empirically validated, production-ready mitigation strategies, bridging the gap between diagnostic insight and deployable OCR engineering.
📝 Abstract
This paper presents an evaluation of the AWS Textract in the context of extracting data from receipts. We analyse Textract functionalities using a dataset that includes receipts of varied formats and conditions. Our analysis provided a qualitative view of Textract strengths and limitations. While the receipts totals were consistently detected, we also observed typical issues and irregularities that were often influenced by image quality and layout. Based on the analysis of the observations, we propose mitigation strategies.