Belief-Space Quantum-Inspired Reinforcement Learning for Partially Observable Autonomous Cyber Defense in the Internet of Vehicles

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the limitations of static intrusion detection in vehicular networks, where adaptive adversaries and partial observability of the environment undermine defense efficacy. To tackle this challenge, the authors formulate security defense as a partially observable sequential attack-defense game and introduce a quantum-inspired, non-Bayesian amplitude-state belief representation mechanism. This approach effectively captures the defender’s uncertainty regarding attacker intent and integrates it into a Proximal Policy Optimization (PPO) framework to enable cost-aware dynamic defense decisions. Experimental results in a simulated environment demonstrate that the proposed method reduces cumulative average damage, damage variance, and attack success rate by 60.4%, 90.2%, and 50.0%, respectively, while improving system survival probability by 46.4% compared to baseline approaches, thereby significantly enhancing defensive effectiveness and robustness under partial observability.

📝 Abstract

The Internet of Vehicles (IoV) faces a dynamic, adversarial security environment where attackers adapt to defenses. Existing intrusion detection systems rely on static classifiers that fail to capture sequential decision-making, attacker adaptation, and uncertainty. We formulate IoV security as a sequential attacker-defender interaction and model defense as a reinforcement learning problem under partial observability. We propose Quantum Belief-Integrated Reinforcement Defense (Q-BIRD), using quantum-inspired belief representation to encode defender uncertainty about hidden attacker intent via amplitude-based states, enabling non-Bayesian belief evolution. Integrated into a Proximal Policy Optimization (PPO) defender, Q-BIRD selects cost-aware mitigation actions. In simulated environments with adaptive, probing attackers, Q-BIRD reduced cumulative mean damage, damage variance, and attack success rate (ASR) by 60.4%, 90.2%, and 50.0%, respectively, while increasing survival probability by 46.4%. Compared to classical Bayesian PPO, damage variance reduction and ASR improved by 10.2 times and 50%. Ablation and explainability analyses confirm that amplitude-based belief is the primary decision signal during strategy transitions when classical belief collapses, providing superior IoV security without additional hardware.

Problem

Research questions and friction points this paper is trying to address.

Partially Observable Environment

Adaptive Attackers

Uncertainty Modeling

Sequential Decision-Making

Autonomous Cyber Defense

Innovation

Methods, ideas, or system contributions that make the work stand out.

quantum-inspired belief

partially observable reinforcement learning

adaptive cyber defense