SELECT: Detecting Label Errors in Real-world Scene Text Data

📅 2025-12-15

📈 Citations: 0

✨ Influential: 0

career value

137K/year

🤖 AI Summary

This work addresses pervasive label noise in real-world scene text datasets, particularly focusing on challenges such as variable-length sequence misalignment and character-level annotation errors (e.g., confusions between visually similar characters). We propose Sequence-Level Semantic Label Corruption (SSLC), the first method capable of precisely detecting label errors in variable-length scene text. SSLC jointly models image–text modality alignment and character-level visual similarity to dynamically generate robust pseudo-corruption labels. It integrates a multimodal encoder with a character-level tokenizer into an end-to-end detection framework. Extensive experiments on multiple real-world scene text benchmarks demonstrate that SSLC significantly outperforms existing approaches, yielding an average 3.2% improvement in Scene Text Recognition (STR) accuracy. The results validate both the effectiveness and practical utility of our method for label noise detection in scene text understanding.

Technology Category

Application Category

📝 Abstract

We introduce SELECT (Scene tExt Label Errors deteCTion), a novel approach that leverages multi-modal training to detect label errors in real-world scene text datasets. Utilizing an image-text encoder and a character-level tokenizer, SELECT addresses the issues of variable-length sequence labels, label sequence misalignment, and character-level errors, outperforming existing methods in accuracy and practical utility. In addition, we introduce Similarity-based Sequence Label Corruption (SSLC), a process that intentionally introduces errors into the training labels to mimic real-world error scenarios during training. SSLC not only can cause a change in the sequence length but also takes into account the visual similarity between characters during corruption. Our method is the first to detect label errors in real-world scene text datasets successfully accounting for variable-length labels. Experimental results demonstrate the effectiveness of SELECT in detecting label errors and improving STR accuracy on real-world text datasets, showcasing its practical utility.

Problem

Research questions and friction points this paper is trying to address.

Detects label errors in real-world scene text datasets

Addresses variable-length sequence labels and misalignment issues

Improves scene text recognition accuracy through error detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal training for detecting label errors

Similarity-based Sequence Label Corruption for training

Addresses variable-length labels and character-level errors

🔎 Similar Papers

LEMoN: Label Error Detection using Multimodal Neighbors