🤖 AI Summary
This paper investigates active learning of unknown binary decision trees via membership queries, providing formal guarantees in large hypothesis spaces. The proposed method introduces a symbolic-logic-based active learning framework: it encodes the space of bounded-depth decision trees as a SAT formula, employs approximate model counting (ApproxMC) to quantify the hypothesis-space reduction induced by each query—enabling near-optimal query selection—and integrates CNF incremental updating with functional equivalence verification to ensure convergence and correctness. Unlike conventional approaches relying on heuristics or exhaustive enumeration, this framework achieves both theoretical rigor and scalability. Experimental results demonstrate that the method consistently converges to the target tree using a minimal number of queries, significantly improving learning efficiency while maintaining formal correctness guarantees.
📝 Abstract
We consider the problem of actively learning an unknown binary decision tree using only membership queries, a setting in which the learner must reason about a large hypothesis space while maintaining formal guarantees. Rather than enumerating candidate trees or relying on heuristic impurity or entropy measures, we encode the entire space of bounded-depth decision trees symbolically in SAT formulas. We propose a symbolic method for active learning of decision trees, in which approximate model counting is used to estimate the reduction of the hypothesis space caused by each potential query, enabling near-optimal query selection without full model enumeration. The resulting learner incrementally strengthens a CNF representation based on observed query outcomes, and approximate model counter ApproxMC is invoked to quantify the remaining version space in a sound and scalable manner. Additionally, when ApproxMC stagnates, a functional equivalence check is performed to verify that all remaining hypotheses are functionally identical. Experiments on decision trees show that the method reliably converges to the correct model using only a handful of queries, while retaining a rigorous SAT-based foundation suitable for formal analysis and verification.