Indigenous Languages Spoken in Argentina: A Survey of NLP and Speech Resources

📅 2025-01-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Indigenous languages in Argentina face severe endangerment due to the absence of systematic language resource repositories and linguistically appropriate computational tools. Method: This study introduces the first nationally comprehensive classification framework and digital resource atlas covering seven language families and over 30 indigenous languages. It integrates demographic data, NLP resources, and speech corpora to conduct cross-regional dialectal resource assessment, employing linguistic typology, metadata standardization, systematic literature review, and demographic analysis. Contribution/Results: We deliver Argentina’s first national inventory of indigenous language resources, precisely identifying critical gaps in speech recognition, tokenization, and lexicography. Based on quantitative and qualitative evaluation, we propose a tiered prioritization schema for resource development. The atlas establishes a scalable, empirically grounded methodology for endangered language documentation, digital preservation, and cultural revitalization—bridging linguistic scholarship with computational infrastructure for under-resourced languages.

Technology Category

Application Category

📝 Abstract
Argentina has a diverse, yet little-known, Indigenous language heritage. Most of these languages are at risk of disappearing, resulting in a significant loss of world heritage and cultural knowledge. Currently, no unified information on speakers and computational tools is available for these languages. In this work, we present a systematization of the Indigenous languages spoken in Argentina, along with national demographic data on the country's Indigenous population. The languages are classified into seven families: Mapuche, Tup'i-Guaran'i, Guaycur'u, Quechua, Mataco-Mataguaya, Aymara, and Chon. We also provide an introductory survey of the computational resources available for these languages, whether or not they are specifically developed for Argentine varieties.
Problem

Research questions and friction points this paper is trying to address.

Endangered Languages
Indigenous Knowledge
Computational Tools
Innovation

Methods, ideas, or system contributions that make the work stand out.

Indigenous Languages
Speech Technology
Language Preservation
🔎 Similar Papers
No similar papers found.
B
Belu Ticona
George Mason University, United States; Departamento de Computación, FCEyN, Universidad de Buenos Aires (UBA), Argentina
F
Fernando Carranza
Departamento de Letras, FFyL, UBA, Argentina; Instituto de Filología y Literaturas Hispánicas “Dr. Amado Alonso”, UBA, Argentina
Viviana Cotik
Viviana Cotik
Universidad de Buenos Aires
artificial intelligencenatural language processingmachine learningdata qualitydata mining