PHASE: Passive Human Activity Simulation Evaluation

📅 2025-07-17

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current cybersecurity simulation environments (e.g., cyber ranges, honeypots) lack quantitative methods to assess the realism of synthetic user behavior. To address this, we propose PHASE—a passive machine learning framework that distinguishes human from synthetic network activity solely from Zeek connection logs, without intrusive instrumentation. PHASE innovatively leverages local DNS query records for fine-grained, unsupervised traffic labeling and integrates SHAP-based explainability analysis to identify discriminative temporal and interactional behavioral signatures of human users. Experimental evaluation demonstrates that PHASE achieves over 90% accuracy in detecting non-human behavioral patterns and provides actionable insights to guide synthetic traffic generation, significantly enhancing its human-likeness. As the first explainable, non-intrusive, and high-fidelity quantification tool for realism assessment in cyber simulations, PHASE enables rigorous, interpretable evaluation of synthetic user fidelity in operational environments.

Technology Category

Application Category

📝 Abstract

Cybersecurity simulation environments, such as cyber ranges, honeypots, and sandboxes, require realistic human behavior to be effective, yet no quantitative method exists to assess the behavioral fidelity of synthetic user personas. This paper presents PHASE (Passive Human Activity Simulation Evaluation), a machine learning framework that analyzes Zeek connection logs and distinguishes human from non-human activity with over 90% accuracy. PHASE operates entirely passively, relying on standard network monitoring without any user-side instrumentation or visible signs of surveillance. All network activity used for machine learning is collected via a Zeek network appliance to avoid introducing unnecessary network traffic or artifacts that could disrupt the fidelity of the simulation environment. The paper also proposes a novel labeling approach that utilizes local DNS records to classify network traffic, thereby enabling machine learning analysis. Furthermore, we apply SHAP (SHapley Additive exPlanations) analysis to uncover temporal and behavioral signatures indicative of genuine human users. In a case study, we evaluate a synthetic user persona and identify distinct non-human patterns that undermine behavioral realism. Based on these insights, we develop a revised behavioral configuration that significantly improves the human-likeness of synthetic activity yielding a more realistic and effective synthetic user persona.

Problem

Research questions and friction points this paper is trying to address.

Lacks quantitative method to assess synthetic user behavior fidelity

Needs passive framework to distinguish human from non-human activity

Requires improved behavioral realism in cybersecurity simulations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine learning framework analyzes Zeek logs

Passive monitoring with no user-side instrumentation

Novel DNS-based labeling for traffic classification

🔎 Similar Papers

No similar papers found.

Authors to Follow