Survey on Publicly Available Sinhala Natural Language Processing Tools and Research

πŸ“… 2019-06-05
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 39
✨ Influential: 3
πŸ“„ PDF
πŸ€– AI Summary
The Sinhala NLP community suffers from fragmented resources, absence of systematic surveys, and lack of collaborative benchmarks. Method: This paper introduces the first dynamically updated, open-source panoramic survey of Sinhala NLP. Leveraging bibliometric analysis, automated crawling and classification of open-source tools, multidimensional metadata annotation, and continuous arXiv tracking, it systematically catalogs dozens of global Sinhala NLP projects and tools. Contribution/Results: It identifies critical technical gaps and reuse pathways, and innovatively establishes a sustainably maintained knowledge graph and collaborative benchmark suiteβ€”filling a key void in unified surveys for low-resource language NLP. The survey significantly enhances community visibility, reproducibility, and interoperability, and has become the central reference and coordination hub for Sinhala NLP researchers in Sri Lanka and worldwide.
πŸ“ Abstract
Sinhala is the native language of the Sinhalese people who make up the largest ethnic group of Sri Lanka. The language belongs to the globe-spanning language tree, Indo-European. However, due to poverty in both linguistic and economic capital, Sinhala, in the perspective of Natural Language Processing tools and research, remains a resource-poor language which has neither the economic drive its cousin English has nor the sheer push of the law of numbers a language such as Chinese has. A number of research groups from Sri Lanka have noticed this dearth and the resultant dire need for proper tools and research for Sinhala natural language processing. However, due to various reasons, these attempts seem to lack coordination and awareness of each other. The objective of this paper is to fill that gap of a comprehensive literature survey of the publicly available Sinhala natural language tools and research so that the researchers working in this field can better utilize contributions of their peers. As such, we shall be uploading this paper to arXiv and perpetually update it periodically to reflect the advances made in the field.
Problem

Research questions and friction points this paper is trying to address.

Lack of Sinhala NLP tools
Need for coordinated research
Survey of Sinhala NLP advancements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive Sinhala NLP survey
Periodic updates on arXiv
Enhances research coordination awareness
πŸ”Ž Similar Papers
No similar papers found.