Cross-lingual Opinions and Emotions Mining in Comparable Documents

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates cross-lingual consistency of sentiment expression in English–Arabic comparable news documents. Method: We propose a machine-translation-free cross-lingual sentiment annotation framework, constructing a bilingual sentiment lexicon by manually aligning the Arabic WordNet-Affect with its English counterpart, and perform cross-lingual subjectivity and emotion annotation on multi-source news corpora, followed by statistical consistency analysis. Contribution/Results: We quantitatively demonstrate—for the first time—that English–Arabic document pairs from the same news agency exhibit high sentiment consistency (p < 0.01), whereas pairs from different agencies show significant divergence—indicating that source identity exerts a stronger influence on cross-lingual sentiment expression than language itself. The framework is language-agnostic and scalable, offering a reproducible technical pathway for sentiment analysis in low-resource languages.

Technology Category

Application Category

📝 Abstract
Comparable texts are topic-aligned documents in multiple languages that are not direct translations. They are valuable for understanding how a topic is discussed across languages. This research studies differences in sentiments and emotions across English-Arabic comparable documents. First, texts are annotated with sentiment and emotion labels. We apply a cross-lingual method to label documents with opinion classes (subjective/objective), avoiding reliance on machine translation. To annotate with emotions (anger, disgust, fear, joy, sadness, surprise), we manually translate the English WordNet-Affect (WNA) lexicon into Arabic, creating bilingual emotion lexicons used to label the comparable corpora. We then apply a statistical measure to assess the agreement of sentiments and emotions in each source-target document pair. This comparison is especially relevant when the documents originate from different sources. To our knowledge, this aspect has not been explored in prior literature. Our study includes English-Arabic document pairs from Euronews, BBC, and Al-Jazeera (JSC). Results show that sentiment and emotion annotations align when articles come from the same news agency and diverge when they come from different ones. The proposed method is language-independent and generalizable to other language pairs.
Problem

Research questions and friction points this paper is trying to address.

Analyze sentiment and emotion differences in English-Arabic comparable documents
Develop cross-lingual annotation without machine translation dependency
Assess sentiment-emotion agreement in multi-source document pairs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-lingual sentiment labeling without machine translation
Bilingual emotion lexicons from manual translation
Statistical measure for sentiment-emotion agreement assessment
🔎 Similar Papers
No similar papers found.