Enhancing Neural Adaptive Wireless Video Streaming via Lower-Layer Information Exposure and Online Tuning

📅 2025-01-02
📈 Citations: 1
Influential: 1
📄 PDF
🤖 AI Summary
Existing DRL-based wireless video adaptive streaming methods suffer from limited QoE due to reliance solely on high-level, delayed state representations. To address this, we propose a cross-layer awareness framework that integrates real-time physical- and link-layer states. We formulate an infinite-horizon discounted MDP model incorporating information-cost trade-offs, design the first eA3C offline training algorithm leveraging low-layer network states, and introduce two user-personalized online continual learning mechanisms for runtime policy refinement. Experiments show that the offline policy improves QoE by 6.8%–14.4% over state-of-the-art baselines; subsequent online optimization further boosts QoE by 6%–28%, significantly enhancing real-time responsiveness and personalization capability.

Technology Category

Application Category

📝 Abstract
Deep reinforcement learning (DRL) demonstrates its promising potential in the realm of adaptive video streaming and has recently received increasing attention. However, existing DRL-based methods for adaptive video streaming use only application (APP) layer information, adopt heuristic training methods, and train generalized neural networks with pre-collected data. This paper aims to boost the quality of experience (QoE) of adaptive wireless video streaming by using lower-layer information, deriving a rigorous training method, and adopting online tuning with real-time data. First, we formulate a more comprehensive and accurate adaptive wireless video streaming problem as an infinite stage discounted Markov decision process (MDP) problem by additionally incorporating past and lower-layer information, allowing a flexible tradeoff between QoE and costs for obtaining system information and solving the problem. In the offline scenario (only with pre-collected data), we propose an enhanced asynchronous advantage actor-critic (eA3C) method by jointly optimizing the parameters of parameterized policy and value function. Specifically, we build an eA3C network consisting of a policy network and a value network that can utilize cross-layer, past, and current information and jointly train the eA3C network using pre-collected samples. In the online scenario (with additional real-time data), we propose two continual learning-based online tuning methods for designing better policies for a specific user with different QoE and training time tradeoffs. Finally, experimental results show that the proposed offline policy can improve the QoE by 6.8~14.4% compared to the state-of-arts in the offline scenario, and the proposed online policies can further achieve 6~28% gains in QoE over the proposed offline policy in the online scenario.
Problem

Research questions and friction points this paper is trying to address.

Deep Reinforcement Learning
Wireless Video Streaming
Limited Information Training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Reinforcement Learning
Adaptive Strategies
Joint Learning
Lingzhi Zhao
Lingzhi Zhao
University of Illinois at Urbana-Champaign
multimedia system
Y
Ying Cui
IoT Thrust, the Hong Kong University of Science and Technology (Guangzhou), China, and also with The Hong Kong University of Science and Technology, Hong Kong, SAR, China
Y
Yuhang Jia
Tencent Technology
Y
Yunfei Zhang
Tencent Technology
Klara Nahrstedt
Klara Nahrstedt
Computer Science, University of Illinois, Urbana-Champaign
Quality of Servicemultimedia systemsdistributed systemsnetworksteleimmersion