🤖 AI Summary
To address the susceptibility of safety-critical sequential decision-making systems to learning unsafe behaviors, this paper proposes a curiosity-driven black-box fuzzing framework for efficiently discovering diverse crash-triggering scenarios. Methodologically, it innovatively integrates an intrinsic curiosity mechanism with a multi-objective seed selection strategy to dynamically balance exploration of novel states and fault triggering. Technically, it incorporates deep learning–based mutation generation, prediction-uncertainty–guided novelty measurement, and Pareto-optimal seed scheduling. Experimental evaluation across multiple mainstream sequential decision models demonstrates that the approach significantly outperforms existing state-of-the-art methods: it achieves a 23.6% improvement in fault detection rate and a 41.2% increase in crash-scenario diversity. Moreover, the generated diverse failure cases facilitate subsequent model repair and robustness enhancement.
📝 Abstract
Sequential decision-making processes (SDPs) are fundamental for complex real-world challenges, such as autonomous driving, robotic control, and traffic management. While recent advances in Deep Learning (DL) have led to mature solutions for solving these complex problems, SDMs remain vulnerable to learning unsafe behaviors, posing significant risks in safety-critical applications. However, developing a testing framework for SDMs that can identify a diverse set of crash-triggering scenarios remains an open challenge. To address this, we propose CureFuzz, a novel curiosity-driven black-box fuzz testing approach for SDMs. CureFuzz proposes a curiosity mechanism that allows a fuzzer to effectively explore novel and diverse scenarios, leading to improved detection of crashtriggering scenarios. Additionally, we introduce a multi-objective seed selection technique to balance the exploration of novel scenarios and the generation of crash-triggering scenarios, thereby optimizing the fuzzing process. We evaluate CureFuzz on various SDMs and experimental results demonstrate that CureFuzz outperforms the state-of-the-art method by a substantial margin in the total number of faults and distinct types of crash-triggering scenarios. We also demonstrate that the crash-triggering scenarios found by CureFuzz can repair SDMs, highlighting CureFuzz as a valuable tool for testing SDMs and optimizing their performance.