🤖 AI Summary
Formal properties of feedforward neural networks—particularly large vision-language models—are difficult to infer automatically. Method: We propose a formal rule extraction method grounded in latent-layer neuron activation patterns, the first systematic approach to translate neuron activation values or binary switch states into verifiable logical rules. These rules encode semantic logic explicitly, with activation patterns as premises and output behaviors as conclusions. By integrating symbolic reasoning with neural analysis, our method supports property explanation, compositional verification, robustness repair, and runtime monitoring across diverse model architectures. Contribution/Results: Experiments demonstrate strong generalization across multiple models, significantly advancing the deep integration of neural network interpretability and formal verification. The approach enables rigorous, logic-based characterization of neural behavior while maintaining scalability and practical applicability.
📝 Abstract
We present Prophecy, a tool for automatically inferring formal properties of feed-forward neural networks. Prophecy is based on the observation that a significant part of the logic of feed-forward networks is captured in the activation status of the neurons at inner layers. Prophecy works by extracting rules based on neuron activations (values or on/off statuses) as preconditions that imply certain desirable output property, e.g., the prediction being a certain class. These rules represent network properties captured in the hidden layers that imply the desired output behavior. We present the architecture of the tool, highlight its features and demonstrate its usage on different types of models and output properties. We present an overview of its applications, such as inferring and proving formal explanations of neural networks, compositional verification, run-time monitoring, repair, and others. We also show novel results highlighting its potential in the era of large vision-language models.