🤖 AI Summary
This work addresses the limited interpretability of recommender systems by proposing a novel, image-based personalized explanation generation method leveraging user-uploaded photos. The core innovation lies in formulating explainable recommendation as an image prediction task from the user’s perspective—specifically, predicting the types of product images a given user is likely to capture—and introducing a user–photo attribution probability framework that jointly models user preferences and explanation generation. The method integrates convolutional neural networks (CNNs) for semantic image feature extraction with user embedding representations to capture behavioral preferences, enabling end-to-end, image-level explanation generation. Experiments on the TripAdvisor restaurant dataset demonstrate that the approach accurately predicts user-preferred image categories, significantly enhancing recommendation transparency, user trust, and explanation credibility. Moreover, it delivers quantifiable, visually grounded insights into user attention patterns, offering actionable analytics for businesses.
📝 Abstract
Explaining the output of a complex system, such as a Recommender System (RS), is becoming of utmost importance for both users and companies. In this paper we explore the idea that personalized explanations can be learned as recommendation themselves. There are plenty of online services where users can upload some photos, in addition to rating items. We assume that users take these photos to reinforce or justify their opinions about the items. For this reason we try to predict what photo a user would take of an item, because that image is the argument that can best convince her of the qualities of the item. In this sense, an RS can explain its results and, therefore, increase its reliability. Furthermore, once we have a model to predict attractive images for users, we can estimate their distribution. Thus, the companies acquire a vivid knowledge about the aspects that the clients highlight of their products. The paper includes a formal framework that estimates the authorship probability for a given pair (user, photo). To illustrate the proposal, we use data gathered from TripAdvisor containing the reviews (with photos) of restaurants in six cities of different sizes.