Optimal Dataset Size for Recommender Systems: Evaluating Algorithms' Performance via Downsampling

📅 2025-02-12

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This study addresses the high energy consumption of recommender systems by systematically investigating, for the first time, the trade-off between energy efficiency and recommendation performance under dataset downsampling. We evaluate 12 state-of-the-art recommendation algorithms across seven public benchmark datasets, proposing a dual-user-and-item downsampling strategy coupled with core pruning. Evaluation adopts a multi-dimensional framework integrating nDCG@10 and carbon emissions (kgCO₂e) to jointly assess accuracy and environmental impact. Results show that 30% downsampling reduces training time by 52% and cuts carbon emissions by up to 51.02 kgCO₂e; at 50% downsampling, average performance remains at 81% of the full-data baseline, with several algorithms even surpassing it under progressive sampling. This work establishes a reproducible, empirically grounded paradigm for energy-aware recommendation—advancing green AI through actionable, quantified optimization guidelines and an open benchmark.

Technology Category

Application Category

📝 Abstract

This thesis investigates dataset downsampling as a strategy to optimize energy efficiency in recommender systems while maintaining competitive performance. With increasing dataset sizes posing computational and environmental challenges, this study explores the trade-offs between energy efficiency and recommendation quality in Green Recommender Systems, which aim to reduce environmental impact. By applying two downsampling approaches to seven datasets, 12 algorithms, and two levels of core pruning, the research demonstrates significant reductions in runtime and carbon emissions. For example, a 30% downsampling portion can reduce runtime by 52% compared to the full dataset, leading to a carbon emission reduction of up to 51.02 KgCO2e during the training of a single algorithm on a single dataset. The analysis reveals that algorithm performance under different downsampling portions depends on factors like dataset characteristics, algorithm complexity, and the specific downsampling configuration (scenario dependent). Some algorithms, which showed lower nDCG@10 scores compared to higher-performing ones, exhibited lower sensitivity to the amount of training data, offering greater potential for efficiency in lower downsampling portions. On average, these algorithms retained 81% of full-size performance using only 50% of the training set. In certain downsampling configurations, where more users were progressively included while keeping the test set size fixed, they even showed higher nDCG@10 scores than when using the full dataset. These findings highlight the feasibility of balancing sustainability and effectiveness, providing insights for designing energy-efficient recommender systems and promoting sustainable AI practices.

Problem

Research questions and friction points this paper is trying to address.

Optimize energy efficiency in recommender systems

Trade-offs between energy efficiency and recommendation quality

Balance sustainability and effectiveness in AI practices

Innovation

Methods, ideas, or system contributions that make the work stand out.

Downsampling optimizes energy efficiency

Reduces runtime and carbon emissions

Balances sustainability and effectiveness

🔎 Similar Papers

A Comprehensive Survey on Retrieval Methods in Recommender Systems