Towards Bridging Formal Methods and Human Interpretability

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Labeling Transfer Systems (LTS) lack cognitively grounded, quantifiable measures of human interpretability. Method: We propose and empirically validate seven graph-theoretic interpretability metrics for LTS designs, integrating software engineering metrics, graph-theoretic analysis, Bradley–Terry pairwise comparison modeling, and Kendall’s Tau correlation testing. These metrics are embedded into Fortis—a formal design repair framework—to enable human-cognition-aware automatic reordering of LTS specifications. Contribution/Results: Four metrics—including Albini complexity—significantly predict human comprehension performance (p < 0.01), establishing the first empirical link between LTS interpretability and graph-theoretic measures. Optimization guided by Albini complexity reduces human interpretation time by 39%. Our approach provides a reusable theoretical framework and practical methodology for enhancing the explainability of formal methods, bridging cognitive science, graph theory, and formal specification engineering.

Technology Category

Application Category

📝 Abstract

Labeled Transition Systems (LTS) are integral to model checking and design repair tools. System engineers frequently examine LTS designs during model checking or design repair to debug, identify inconsistencies, and validate system behavior. Despite LTS's significance, no prior research has examined human comprehension of these designs. To address this, we draw on traditional software engineering and graph theory, identifying 7 key metrics: cyclomatic complexity, state space size, average branching factor, maximum depth, Albin complexity, modularity, and redundancy. We created a dataset of 148 LTS designs, sampling 48 for 324 paired comparisons, and ranked them using the Bradley-Terry model. Through Kendall's Tau correlation analysis, we found that Albin complexity ($ au = 0.444$), state space size ($ au = 0.420$), cyclomatic complexity ($ au = 0.366$), and redundancy ($ au = 0.315$) most accurately reflect human comprehension of LTS designs. To showcase the metrics' utility, we applied the Albin complexity metric within the Fortis design repair tool, ranking system redesigns. This ranking reduced annotators' comprehension time by 39%, suggesting that metrics emphasizing human factors can enhance formal design interpretability.

Problem

Research questions and friction points this paper is trying to address.

Examining human comprehension of Labeled Transition Systems (LTS) designs

Identifying key metrics that reflect human understanding of LTS designs

Applying metrics to improve interpretability in formal design tools

Innovation

Methods, ideas, or system contributions that make the work stand out.

Used 7 key metrics from graph theory

Applied Bradley-Terry model for ranking

Integrated Albin complexity in Fortis tool

🔎 Similar Papers

A Systematic Literature Review on Explainability for Machine/Deep Learning-based Software Engineering Research