🤖 AI Summary
To address the limitation of numerical app-store ratings in capturing fine-grained sentiment expressed in user reviews, this paper proposes a Chain-of-Thought (CoT) prompting method based on large language models (LLMs), systematically introducing explicit stepwise reasoning to this task for the first time. We construct a manually annotated benchmark dataset of 2,000 Amazon app reviews and design interpretable reasoning paths to enhance model understanding of implicit sentiment intent. Controlled experiments against standard prompting demonstrate that CoT prompting significantly improves fine-grained sentiment classification accuracy from 84% to 93%, validating the critical performance gain conferred by structured reasoning. Our core contribution lies in the novel adaptation of the CoT paradigm to fine-grained sentiment classification—achieving simultaneous improvements in predictive accuracy and decision interpretability.
📝 Abstract
We explore the use of Chain-of-Thought (CoT) prompting with large language models (LLMs) to improve the accuracy of granular sentiment categorization in app store reviews. Traditional numeric and polarity-based ratings often fail to capture the nuanced sentiment embedded in user feedback. We evaluated the effectiveness of CoT prompting versus simple prompting on 2000 Amazon app reviews by comparing each method's predictions to human judgements. CoT prompting improved classification accuracy from 84% to 93% highlighting the benefit of explicit reasoning in enhancing sentiment analysis performance.