Evaluation of End-to-End Continuous Spanish Lipreading in Different Data Conditions

📅 2025-02-01

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the limited robustness of end-to-end Spanish continuous visual speech recognition (lip-reading) under data scarcity, acoustic noise, and cross-speaker variability. To this end, we propose the first end-to-end Spanish lip-reading system, featuring a low-resource-adapted temporal modeling strategy based on a Transformer architecture. Our approach integrates joint CTC–attention decoding, visual feature enhancement, and synthetic data augmentation. We conduct the first systematic evaluation of Spanish lip-reading generalization across diverse realistic conditions—including visual ambiguity, inter-speaker articulatory variation, and silent frames. On the Spanish-LRS benchmark, our model achieves a word error rate (WER) of 38.2%, representing a significant 9.7% absolute reduction over the baseline. Notably, it maintains structural modeling capability for speech streams even in few-shot settings, demonstrating strong adaptability to challenging visual conditions.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

Visual Speech Recognition

Spanish Lip-reading

Accuracy Under Various Conditions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spanish Lip-Reading

Hybrid CTC/Attention Architecture

State-of-the-Art Performance

🔎 Similar Papers

No similar papers found.

Authors to Follow