🤖 AI Summary
Existing approaches for characterizing structural differences between ordered labeled trees (e.g., XML/JSON) lack formal specifications and learnability guarantees.
Method: We propose a pattern-based tree transformation specification language and formulate tree-difference explanation as a rule induction learning problem—proving its NP-hardness. Our approach integrates pattern matching, formal specification design, and SAT solving to automatically synthesize compact, verifiable transformation rules.
Contribution/Results: The method ensures rule conciseness, semantic clarity, and formal correctness. Experiments on code-evolution datasets from CS education demonstrate that our synthesized rules are significantly smaller, semantically interpretable, and achieve high accuracy in capturing structural changes—thereby enhancing both human understanding and computational efficiency in tree-difference analysis.
📝 Abstract
Explaining why and how a tree $t$ structurally differs from another tree $t^star$ is a question that is encountered throughout computer science, including in understanding tree-structured data such as XML or JSON data. In this article, we explore how to learn explanations for structural differences between pairs of trees from sample data: suppose we are given a set ${(t_1, t_1^star),dots, (t_n, t_n^star)}$ of pairs of labelled, ordered trees; is there a small set of rules that explains the structural differences between all pairs $(t_i, t_i^star)$? This raises two research questions: (i) what is a good notion of"rule"in this context?; and (ii) how can sets of rules explaining a data set be learned algorithmically? We explore these questions from the perspective of database theory by (1) introducing a pattern-based specification language for tree transformations; (2) exploring the computational complexity of variants of the above algorithmic problem, e.g. showing NP-hardness for very restricted variants; and (3) discussing how to solve the problem for data from CS education research using SAT solvers.