Belief-Space Quantum-Inspired Reinforcement Learning for Partially Observable Autonomous Cyber Defense in the Internet of Vehicles

πŸ“… 2026-06-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the limitations of static intrusion detection in vehicular networks, where adaptive adversaries and partial observability of the environment undermine defense efficacy. To tackle this challenge, the authors formulate security defense as a partially observable sequential attack-defense game and introduce a quantum-inspired, non-Bayesian amplitude-state belief representation mechanism. This approach effectively captures the defender’s uncertainty regarding attacker intent and integrates it into a Proximal Policy Optimization (PPO) framework to enable cost-aware dynamic defense decisions. Experimental results in a simulated environment demonstrate that the proposed method reduces cumulative average damage, damage variance, and attack success rate by 60.4%, 90.2%, and 50.0%, respectively, while improving system survival probability by 46.4% compared to baseline approaches, thereby significantly enhancing defensive effectiveness and robustness under partial observability.
πŸ“ Abstract
The Internet of Vehicles (IoV) faces a dynamic, adversarial security environment where attackers adapt to defenses. Existing intrusion detection systems rely on static classifiers that fail to capture sequential decision-making, attacker adaptation, and uncertainty. We formulate IoV security as a sequential attacker-defender interaction and model defense as a reinforcement learning problem under partial observability. We propose Quantum Belief-Integrated Reinforcement Defense (Q-BIRD), using quantum-inspired belief representation to encode defender uncertainty about hidden attacker intent via amplitude-based states, enabling non-Bayesian belief evolution. Integrated into a Proximal Policy Optimization (PPO) defender, Q-BIRD selects cost-aware mitigation actions. In simulated environments with adaptive, probing attackers, Q-BIRD reduced cumulative mean damage, damage variance, and attack success rate (ASR) by 60.4%, 90.2%, and 50.0%, respectively, while increasing survival probability by 46.4%. Compared to classical Bayesian PPO, damage variance reduction and ASR improved by 10.2 times and 50%. Ablation and explainability analyses confirm that amplitude-based belief is the primary decision signal during strategy transitions when classical belief collapses, providing superior IoV security without additional hardware.
Problem

Research questions and friction points this paper is trying to address.

Partially Observable Environment
Adaptive Attackers
Uncertainty Modeling
Sequential Decision-Making
Autonomous Cyber Defense
Innovation

Methods, ideas, or system contributions that make the work stand out.

quantum-inspired belief
partially observable reinforcement learning
adaptive cyber defense
amplitude-based state representation
Internet of Vehicles security
πŸ”Ž Similar Papers
No similar papers found.