🤖 AI Summary
This study investigates cross-lingual consistency of sentiment expression in English–Arabic comparable news documents.
Method: We propose a machine-translation-free cross-lingual sentiment annotation framework, constructing a bilingual sentiment lexicon by manually aligning the Arabic WordNet-Affect with its English counterpart, and perform cross-lingual subjectivity and emotion annotation on multi-source news corpora, followed by statistical consistency analysis.
Contribution/Results: We quantitatively demonstrate—for the first time—that English–Arabic document pairs from the same news agency exhibit high sentiment consistency (p < 0.01), whereas pairs from different agencies show significant divergence—indicating that source identity exerts a stronger influence on cross-lingual sentiment expression than language itself. The framework is language-agnostic and scalable, offering a reproducible technical pathway for sentiment analysis in low-resource languages.
📝 Abstract
Comparable texts are topic-aligned documents in multiple languages that are not direct translations. They are valuable for understanding how a topic is discussed across languages. This research studies differences in sentiments and emotions across English-Arabic comparable documents. First, texts are annotated with sentiment and emotion labels. We apply a cross-lingual method to label documents with opinion classes (subjective/objective), avoiding reliance on machine translation. To annotate with emotions (anger, disgust, fear, joy, sadness, surprise), we manually translate the English WordNet-Affect (WNA) lexicon into Arabic, creating bilingual emotion lexicons used to label the comparable corpora. We then apply a statistical measure to assess the agreement of sentiments and emotions in each source-target document pair. This comparison is especially relevant when the documents originate from different sources. To our knowledge, this aspect has not been explored in prior literature. Our study includes English-Arabic document pairs from Euronews, BBC, and Al-Jazeera (JSC). Results show that sentiment and emotion annotations align when articles come from the same news agency and diverge when they come from different ones. The proposed method is language-independent and generalizable to other language pairs.