🤖 AI Summary
This work addresses an emerging safety issue in large vision-language models (VLMs): *implicit reasoning safety*—where benign multimodal inputs trigger harmful outputs due to latent, defective cross-modal reasoning within the model. We formally define this concept and introduce SSUI, the first dedicated benchmark for evaluating implicit reasoning vulnerabilities under image-text compositions. Methodologically, we integrate multimodal safety analysis, implicit reasoning path tracing, and in-context learning (ICL)-based intervention. Experiments demonstrate that lightweight ICL prompts—without model fine-tuning—significantly suppress such unsafe behaviors, establishing ICL as an efficient, parameter-free defense paradigm. Our contributions include: (1) a novel conceptual framework for VLM safety, (2) SSUI—a standardized evaluation benchmark exposing implicit reasoning flaws, and (3) a practical, deployable mitigation strategy. This work advances VLM safety assessment and robustness enhancement through new theoretical insight, empirical grounding, and actionable methodology.
📝 Abstract
Large Vision-Language Models face growing safety challenges with multimodal inputs. This paper introduces the concept of Implicit Reasoning Safety, a vulnerability in LVLMs. Benign combined inputs trigger unsafe LVLM outputs due to flawed or hidden reasoning. To showcase this, we developed Safe Semantics, Unsafe Interpretations, the first dataset for this critical issue. Our demonstrations show that even simple In-Context Learning with SSUI significantly mitigates these implicit multimodal threats, underscoring the urgent need to improve cross-modal implicit reasoning.