🤖 AI Summary
Labeling Transfer Systems (LTS) lack cognitively grounded, quantifiable measures of human interpretability.
Method: We propose and empirically validate seven graph-theoretic interpretability metrics for LTS designs, integrating software engineering metrics, graph-theoretic analysis, Bradley–Terry pairwise comparison modeling, and Kendall’s Tau correlation testing. These metrics are embedded into Fortis—a formal design repair framework—to enable human-cognition-aware automatic reordering of LTS specifications.
Contribution/Results: Four metrics—including Albini complexity—significantly predict human comprehension performance (p < 0.01), establishing the first empirical link between LTS interpretability and graph-theoretic measures. Optimization guided by Albini complexity reduces human interpretation time by 39%. Our approach provides a reusable theoretical framework and practical methodology for enhancing the explainability of formal methods, bridging cognitive science, graph theory, and formal specification engineering.
📝 Abstract
Labeled Transition Systems (LTS) are integral to model checking and design repair tools. System engineers frequently examine LTS designs during model checking or design repair to debug, identify inconsistencies, and validate system behavior. Despite LTS's significance, no prior research has examined human comprehension of these designs. To address this, we draw on traditional software engineering and graph theory, identifying 7 key metrics: cyclomatic complexity, state space size, average branching factor, maximum depth, Albin complexity, modularity, and redundancy. We created a dataset of 148 LTS designs, sampling 48 for 324 paired comparisons, and ranked them using the Bradley-Terry model. Through Kendall's Tau correlation analysis, we found that Albin complexity ($ au = 0.444$), state space size ($ au = 0.420$), cyclomatic complexity ($ au = 0.366$), and redundancy ($ au = 0.315$) most accurately reflect human comprehension of LTS designs. To showcase the metrics' utility, we applied the Albin complexity metric within the Fortis design repair tool, ranking system redesigns. This ranking reduced annotators' comprehension time by 39%, suggesting that metrics emphasizing human factors can enhance formal design interpretability.