A New Semisupervised Technique for Polarity Analysis using Masked Language Models

📅 2026-04-28

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Traditional spatial polarity models in sentiment analysis often suffer from low accuracy, weak interpretability, and poor consistency. This study proposes a novel polarity analysis method that integrates masked language modeling with semi-supervised learning, uniquely embedding word2vec within a Latent Semantic Scaling (LSS) framework. By estimating the contextual occurrence probabilities of seed words, the approach generates probabilistic polarity scores for both words and documents. Empirical evaluation on coverage of China-related and other international health topics in *China Daily* during the COVID-19 pandemic demonstrates that the proposed method significantly outperforms conventional models, confirming its effectiveness and innovation in enhancing the accuracy, interpretability, and consistency of polarity scoring.

📝 Abstract

I developed a new version of Latent Semantic Scaling (LSS) employing word2vec as a masked language model. Unlike original spatial models, it assigns polarity scores to words and documents as predicted probabilities of seed words to occur in given contexts. These probabilistic polarity scores are more accurate, interpretable and consistent than those spatial polarity models can produce in text analysis. I demonstrate these advantages by applying both probabilistic and spatial models to China Daily's coverage of China and other countries during the coronavirus disease (COVID) pandemic in terms of achievement in health issues. The result suggests that more advanced masked language models would further improve the semisupervised machine learning technique.

Problem

Research questions and friction points this paper is trying to address.

polarity analysis

semisupervised learning

masked language models

sentiment scoring

text analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

masked language model

Latent Semantic Scaling

semisupervised learning