🤖 AI Summary
To address the pervasive scalability bottleneck in Datalog program analysis, this paper proposes a general-purpose dynamic pruning method based on the *choice* construct. Our approach achieves adaptive scale control without modifying the underlying analysis logic, by modeling the working set via projection-driven constraints and applying incremental evaluation pruning. The key contribution is the first generalization of Soufflé’s native *choice* mechanism into a declarative, nearly universal pruning framework—enabling efficient, soundness-preserving optimization across arbitrary Datalog analysis architectures. Experimental evaluation on Doop (for Java analysis) and Gigahorse (for Ethereum smart contracts) demonstrates an average speedup exceeding 20×, with negligible precision loss—even on the most challenging inputs—thereby significantly outperforming existing static or analysis-specific pruning techniques.
📝 Abstract
In this work, we present a simple, uniform, and elegant solution to the problem, with stunning practical effectiveness and application to virtually any Datalog-based analysis. The approach consists of leveraging the choice construct, supported natively in modern Datalog engines like Souffl'e. The choice construct allows the definition of functional dependencies in a relation and has been used in the past for expressing worklist algorithms. We show a near-universal construction that allows the choice construct to flexibly limit evaluation of predicates. The technique is applicable to practically any analysis architecture imaginable, since it adaptively prunes evaluation results when a (programmer-controlled) projection of a relation exceeds a desired cardinality. We apply the technique to probably the largest, pre-existing Datalog analysis frameworks in existence: Doop (for Java bytecode) and the main client analyses from the Gigahorse framework (for Ethereum smart contracts). Without needing to understand the existing analysis logic and with minimal, local-only changes, the performance of each framework increases dramatically, by over 20x for the hardest inputs, with near-negligible sacrifice in completeness.