SELECT: Detecting Label Errors in Real-world Scene Text Data

πŸ“… 2025-12-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses pervasive label noise in real-world scene text datasets, particularly focusing on challenges such as variable-length sequence misalignment and character-level annotation errors (e.g., confusions between visually similar characters). We propose Sequence-Level Semantic Label Corruption (SSLC), the first method capable of precisely detecting label errors in variable-length scene text. SSLC jointly models image–text modality alignment and character-level visual similarity to dynamically generate robust pseudo-corruption labels. It integrates a multimodal encoder with a character-level tokenizer into an end-to-end detection framework. Extensive experiments on multiple real-world scene text benchmarks demonstrate that SSLC significantly outperforms existing approaches, yielding an average 3.2% improvement in Scene Text Recognition (STR) accuracy. The results validate both the effectiveness and practical utility of our method for label noise detection in scene text understanding.

Technology Category

Application Category

πŸ“ Abstract
We introduce SELECT (Scene tExt Label Errors deteCTion), a novel approach that leverages multi-modal training to detect label errors in real-world scene text datasets. Utilizing an image-text encoder and a character-level tokenizer, SELECT addresses the issues of variable-length sequence labels, label sequence misalignment, and character-level errors, outperforming existing methods in accuracy and practical utility. In addition, we introduce Similarity-based Sequence Label Corruption (SSLC), a process that intentionally introduces errors into the training labels to mimic real-world error scenarios during training. SSLC not only can cause a change in the sequence length but also takes into account the visual similarity between characters during corruption. Our method is the first to detect label errors in real-world scene text datasets successfully accounting for variable-length labels. Experimental results demonstrate the effectiveness of SELECT in detecting label errors and improving STR accuracy on real-world text datasets, showcasing its practical utility.
Problem

Research questions and friction points this paper is trying to address.

Detects label errors in real-world scene text datasets
Addresses variable-length sequence labels and misalignment issues
Improves scene text recognition accuracy through error detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal training for detecting label errors
Similarity-based Sequence Label Corruption for training
Addresses variable-length labels and character-level errors
πŸ”Ž Similar Papers
No similar papers found.
W
Wenjun Liu
Yidun AI Lab, NetEase
Qian Wu
Qian Wu
postdoctoral associate
analytical chemistry
Y
Yifeng Hu
Yidun AI Lab, NetEase
Y
Yuke Li
Yidun AI Lab, Netease