LACA: Improving Cross-lingual Aspect-Based Sentiment Analysis with LLM Data Augmentation

📅 2025-08-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cross-lingual aspect-based sentiment analysis (ABSA) faces challenges in transferring knowledge from source to target languages, with existing approaches relying on error-prone machine translation—introducing substantial pseudo-label noise and semantic distortion. This paper proposes a translation-free pseudo-label generation framework: leveraging large language models (LLMs) with semantics-preserving prompt engineering to directly generate high-quality natural sentences in the target language alongside corresponding fine-grained sentiment labels. Integrated with a two-stage self-training strategy, it constructs high-fidelity pseudo-labeled datasets without any target-language annotations. The method supports fine-tuning diverse backbone models—including generative ones—and achieves consistent improvements over translation-based state-of-the-art methods across six languages and five models. Results demonstrate strong generalizability and effectiveness: fine-tuned LLMs significantly outperform smaller multilingual models, confirming the superiority of direct, translation-free label generation.

Technology Category

Application Category

📝 Abstract
Cross-lingual aspect-based sentiment analysis (ABSA) involves detailed sentiment analysis in a target language by transferring knowledge from a source language with available annotated data. Most existing methods depend heavily on often unreliable translation tools to bridge the language gap. In this paper, we propose a new approach that leverages a large language model (LLM) to generate high-quality pseudo-labelled data in the target language without the need for translation tools. First, the framework trains an ABSA model to obtain predictions for unlabelled target language data. Next, LLM is prompted to generate natural sentences that better represent these noisy predictions than the original text. The ABSA model is then further fine-tuned on the resulting pseudo-labelled dataset. We demonstrate the effectiveness of this method across six languages and five backbone models, surpassing previous state-of-the-art translation-based approaches. The proposed framework also supports generative models, and we show that fine-tuned LLMs outperform smaller multilingual models.
Problem

Research questions and friction points this paper is trying to address.

Enhancing cross-lingual ABSA without translation tools
Generating pseudo-labelled data using LLMs for target language
Improving sentiment analysis accuracy across multiple languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM generates pseudo-labelled target language data
No reliance on translation tools required
Fine-tunes ABSA model with enhanced pseudo-data
🔎 Similar Papers
No similar papers found.
J
Jakub Šmíd
Department of Computer Science and Engineering, NTIS – New Technologies for the Information Society, University of West Bohemia in Pilsen, Faculty of Applied Sciences
Pavel Přibáň
Pavel Přibáň
Sentisquare, University of West Bohemia
NLPmachine learning
P
Pavel Král
Department of Computer Science and Engineering, NTIS – New Technologies for the Information Society, University of West Bohemia in Pilsen, Faculty of Applied Sciences