AI based signage classification for linguistic landscape studies

📅 2025-10-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional language landscape research relies on manual data collection and annotation, resulting in low efficiency and poor scalability. This study proposes a human–AI collaborative hybrid analytical framework, implemented in Honolulu’s Chinatown, integrating optical character recognition (OCR), multilingual language classification models, and georeferenced image data—augmented by systematic human verification. Results show an overall text recognition accuracy of 79% and identify five recurrent OCR error patterns—including background text interference and failures induced by font variability or suboptimal lighting—thereby elucidating AI behavior and limitations in complex real-world environments. Crucially, this framework pioneers the integration of interpretable, human-led validation directly into AI-driven language landscape analysis. It not only demonstrates the feasibility of large-scale automated analysis but also advances the field toward intelligent, reproducible, and methodologically transparent scholarship.

Technology Category

Application Category

📝 Abstract
Linguistic Landscape (LL) research traditionally relies on manual photography and annotation of public signages to examine distribution of languages in urban space. While such methods yield valuable findings, the process is time-consuming and difficult for large study areas. This study explores the use of AI powered language detection method to automate LL analysis. Using Honolulu Chinatown as a case study, we constructed a georeferenced photo dataset of 1,449 images collected by researchers and applied AI for optical character recognition (OCR) and language classification. We also conducted manual validations for accuracy checking. This model achieved an overall accuracy of 79%. Five recurring types of mislabeling were identified, including distortion, reflection, degraded surface, graffiti, and hallucination. The analysis also reveals that the AI model treats all regions of an image equally, detecting peripheral or background texts that human interpreters typically ignore. Despite these limitations, the results demonstrate the potential of integrating AI-assisted workflows into LL research to reduce such time-consuming processes. However, due to all the limitations and mis-labels, we recognize that AI cannot be fully trusted during this process. This paper encourages a hybrid approach combining AI automation with human validation for a more reliable and efficient workflow.
Problem

Research questions and friction points this paper is trying to address.

Automating linguistic landscape analysis using AI language detection methods
Addressing time-consuming manual signage classification in urban studies
Identifying AI limitations in text recognition for reliable landscape research
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI automates linguistic landscape analysis with OCR
Hybrid approach combines AI automation with human validation
Georeferenced photo dataset enables language classification
🔎 Similar Papers
No similar papers found.
Yuqin Jiang
Yuqin Jiang
Assistant Professor, University of Hawaii at Manoa
Geographic Information ScienceHigh Performance ComputingCyberGIS
S
Song Jiang
University of Hawaiʻi at Mānoa
J
Jacob Algrim
University of Hawaiʻi at Mānoa
T
Trevor Harms
University of Hawaiʻi at Mānoa
M
Maxwell Koenen
University of Hawaiʻi at Mānoa
X
Xinya Lan
University of Hawaiʻi at Mānoa
X
Xingyu Li
University of Hawaiʻi at Mānoa
C
Chun-Han Lin
University of Hawaiʻi at Mānoa
J
Jia Liu
University of Hawaiʻi at Mānoa
J
Jiayang Sun
University of Hawaiʻi at Mānoa
H
Henry Zenger
University of Hawaiʻi at Mānoa