π€ AI Summary
This study addresses the low CDN cache hit rates and high origin bandwidth costs prevalent in short-video platforms, which stem from push-based recommendation systems and highly skewed content popularity distributions. The work proposes a novel proactive caching mechanism that leverages the predictability of usersβ future viewing behavior, exploiting both the request visibility provided by recommender systems and the spatiotemporal overlap in video popularity. Evaluated through simulations based on real user trajectories under tens of thousands of concurrent users, the proposed approach reduces transit bandwidth costs by 11.1% to 111% compared to ten state-of-the-art caching strategies, significantly outperforming existing heuristic and learning-based methods.
π Abstract
Short video platforms like TikTok, Instagram Reels, and YouTube Shorts have gained immense popularity in the last few years and are responsible for a large and growing fraction of Internet traffic. We identify two unique opportunities for improving short video delivery using their existing interactions with content delivery networks (CDNs). First, short videos use a push-based recommendation system, where the user is presented a sequence of videos recommended by the algorithm rather than user explicitly picking content to watch (e.g., in YouTube). Such push-based short video systems offer a unique opportunity for system design by providing visibility into upcoming requests. Second, the popularity of these videos follows a highly skewed Pareto distribution, leading to geographical and temporal overlap amongst videos being served. We leverage these opportunities to build SILC - a lookahead-aware caching system, aimed at (i) reducing CDN cache miss rates, as well as (ii) reducing midgress bandwidth between the CDN and the origin server. Our evaluation of SILC uses traces that we collect from real users, through (i) an in-person user study, and (ii) a data donation program involving 100 TikTok users across the world. Using a combination of these traces, we simulate traffic from 10,000 simultaneous users. Our evaluation shows that, compared to 10 state-of-the-art heuristic and learning-based cache eviction policies, SILC reduces a CDN's midgress costs by 11.1% to 111%.