🤖 AI Summary
Large language models (LLMs) generally lack implicit information security awareness (ISA), rendering them unable to autonomously detect and refuse potentially hazardous requests—thereby exposing users to cybersecurity threats.
Method: We introduce the first comprehensive, mobile-oriented ISA taxonomy covering all conceptual dimensions, instantiated as a 30-scenario benchmark. We propose the first quantitative ISA evaluation framework for LLMs and conduct empirical analysis via systematic prompt engineering, temperature tuning, and multi-dimensional security scenario design.
Contribution/Results: Our findings reveal that systematic prompting significantly outperforms generation parameters (e.g., temperature) in enhancing ISA—establishing it as a critical lever for safety alignment. Moreover, most LLMs exhibit ISA responses only under explicit security cues, failing to infer risks implicitly. This work establishes a reproducible evaluation paradigm and actionable, prompt-based optimization pathways for secure LLM deployment.
📝 Abstract
The popularity of large language models (LLMs) continues to increase, and LLM-based assistants have become ubiquitous, assisting people of diverse backgrounds in many aspects of life. Significant resources have been invested in the safety of LLMs and their alignment with social norms. However, research examining their behavior from the information security awareness (ISA) perspective is lacking. Chatbots and LLM-based assistants may put unwitting users in harm's way by facilitating unsafe behavior. We observe that the ISA inherent in some of today's most popular LLMs varies significantly, with most models requiring user prompts with a clear security context to utilize their security knowledge and provide safe responses to users. Based on this observation, we created a comprehensive set of 30 scenarios to assess the ISA of LLMs. These scenarios benchmark the evaluated models with respect to all focus areas defined in a mobile ISA taxonomy. Among our findings is that ISA is mildly affected by changing the model's temperature, whereas adjusting the system prompt can substantially impact it. This underscores the necessity of setting the right system prompt to mitigate ISA weaknesses. Our findings also highlight the importance of ISA assessment for the development of future LLM-based assistants.