NeoN: A Tool for Automated Detection, Linguistic and LLM-Driven Analysis of Neologisms in Polish

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the low efficiency and scalability limitations of manual neologism detection in Polish, particularly in capturing dynamic lexical evolution. We propose the first multilayered, automated framework for Polish neologism detection and analysis. Our method integrates reference corpus comparison, context-aware lemmatization, orthographic normalization, frequency-based filtering, variant clustering, and a fine-tuned large language model (LLM) module to perform neologism identification, definition generation, domain classification, and sentiment annotation. Complementary real-time RSS monitoring and an interactive visualization interface support human-in-the-loop validation. Our key contributions include: (i) the first linguistically grounded, Polish-specific rule set synergistically integrated with LLMs; (ii) a significant reduction in manual verification effort while maintaining ≥92% precision; and (iii) real-time tracking, structured output, and open-source availability—establishing a scalable, reproducible platform for lexical innovation research.

Technology Category

Application Category

📝 Abstract
NeoN, a tool for detecting and analyzing Polish neologisms. Unlike traditional dictionary-based methods requiring extensive manual review, NeoN combines reference corpora, Polish-specific linguistic filters, an LLM-driven precision-boosting filter, and daily RSS monitoring in a multi-layered pipeline. The system uses context-aware lemmatization, frequency analysis, and orthographic normalization to extract candidate neologisms while consolidating inflectional variants. Researchers can verify candidates through an intuitive interface with visualizations and filtering controls. An integrated LLM module automatically generates definitions and categorizes neologisms by domain and sentiment. Evaluations show NeoN maintains high accuracy while significantly reducing manual effort, providing an accessible solution for tracking lexical innovation in Polish.
Problem

Research questions and friction points this paper is trying to address.

Automated detection and analysis of Polish neologisms
Reducing manual effort in neologism identification
Providing linguistic and LLM-driven neologism categorization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines linguistic filters and LLM for neologism detection
Uses context-aware lemmatization and frequency analysis
Integrates LLM for automatic definition and categorization
🔎 Similar Papers
No similar papers found.
A
Aleksandra Tomaszewska
Institute of Computer Science, Polish Academy of Sciences
Dariusz Czerski
Dariusz Czerski
Instytut Podstaw Informatyki Polskiej Akademii Nauk
sztuczna inteligencja
B
Bartosz Żuk
Institute of Computer Science, Polish Academy of Sciences
Maciej Ogrodniczuk
Maciej Ogrodniczuk
Institute of Computer Science, Polish Academy of Sciences