🤖 AI Summary
This work addresses the challenge of maximizing expected topic coverage in streaming scenarios such as news recommendation, where users arrive randomly and accept recommended items with certain probabilities. The problem is formulated as submodular maximization under stochastic acceptance rates and cast as an online streaming optimization task subject to a matroid constraint. The authors propose the first single-pass algorithm that requires only an upper bound on the number of user visits and uses memory independent of the stream length. They theoretically establish a competitive ratio of $1/(8\delta)$ for the algorithm. Experimental results demonstrate that the proposed method significantly outperforms existing baselines in practical performance.
📝 Abstract
We explore a novel problem in streaming submodular maximization, inspired by the dynamics of news-recommendation platforms. We consider a setting where users can visit a news website at any time, and upon each visit, the website must display up to $k$ news items. User interactions are inherently stochastic: each news item presented to the user is consumed with a certain acceptance probability by the user, and each news item covers certain topics. Our goal is to design a streaming algorithm that maximizes the expected total topic coverage. To address this problem, we establish a connection to submodular maximization subject to a matroid constraint. We show that we can effectively adapt previous methods to address our problem when the number of user visits is known in advance or linear-size memory in the stream length is available. However, in more realistic scenarios where only an upper bound on the visits and sublinear memory is available, the algorithms fail to guarantee any bounded performance. To overcome these limitations, we introduce a new online streaming algorithm that achieves a competitive ratio of $1/(8\delta)$, where $\delta$ controls the approximation quality. Moreover, it requires only a single pass over the stream, and uses memory independent of the stream length. Empirically, our algorithms consistently outperform the baselines.