🤖 AI Summary
VR-induced cybersickness severely limits adoption in healthcare, education, and other domains; yet objective, video-based assessment methods remain unexplored. This paper proposes the first purely video-driven framework for classifying cybersickness severity. It extracts frame-level visual features using an ImageNet-pretrained InceptionV3 backbone and models temporal dynamics via LSTM, enabling end-to-end prediction. Crucially, it is the first approach to systematically integrate static visual representations with explicit temporal modeling to characterize the evolution of cybersickness symptoms induced by VR content, thereby significantly enhancing video-level inference of user comfort. Evaluated on a public VR gameplay dataset, the method achieves 68.4% classification accuracy—outperforming existing video-only baselines. This work establishes a novel paradigm for sensor-free, low-cost quality-of-experience assessment in VR.
📝 Abstract
With the rapid advancement of virtual reality (VR) technology, its adoption across domains such as healthcare, education, and entertainment has grown significantly. However, the persistent issue of cybersickness, marked by symptoms resembling motion sickness, continues to hinder widespread acceptance of VR. While recent research has explored multimodal deep learning approaches leveraging data from integrated VR sensors like eye and head tracking, there remains limited investigation into the use of video-based features for predicting cybersickness. In this study, we address this gap by utilizing transfer learning to extract high-level visual features from VR gameplay videos using the InceptionV3 model pretrained on the ImageNet dataset. These features are then passed to a Long Short-Term Memory (LSTM) network to capture the temporal dynamics of the VR experience and predict cybersickness severity over time. Our approach effectively leverages the time-series nature of video data, achieving a 68.4% classification accuracy for cybersickness severity. This surpasses the performance of existing models trained solely on video data, providing a practical tool for VR developers to evaluate and mitigate cybersickness in virtual environments. Furthermore, this work lays the foundation for future research on video-based temporal modeling for enhancing user comfort in VR applications.