🤖 AI Summary
To address the high storage overhead and low query efficiency of provenance graphs generated from audit logs, this paper proposes Dehydrator, a lightweight, learnable compression and querying framework. Methodologically, Dehydrator introduces two key innovations: (1) a dual redundancy elimination mechanism that jointly employs field-level mapping encoding and structure-level hierarchical graph encoding; and (2) the first application of sequence generation modeling to provenance graph compression and reconstruction, enabling an end-to-end learnable, lightweight storage system. Experimental evaluation on billion-scale audit log data demonstrates that Dehydrator achieves an 84.55% storage compression ratio. Moreover, its batch query throughput outperforms PostgreSQL, Neo4j, and Leonard by 7.36×, 7.16×, and 16.17×, respectively—striking a significant balance between high compression ratio and high query throughput.
📝 Abstract
As the scope and impact of cyber threats have expanded, analysts utilize audit logs to hunt threats and investigate attacks. The provenance graphs constructed from kernel logs are increasingly considered as an ideal data source due to their powerful semantic expression and attack historic correlation ability. However, storing provenance graphs with traditional databases faces the challenge of high storage overhead, given the high frequency of kernel events and the persistence of attacks. To address this, we propose Dehydrator, an efficient provenance graph storage system. For the logs generated by auditing frameworks, Dehydrator uses field mapping encoding to filter field-level redundancy, hierarchical encoding to filter structure-level redundancy, and finally learns a deep neural network to support batch querying. We have conducted evaluations on seven datasets totaling over one billion log entries. Experimental results show that Dehydrator reduces the storage space by 84.55%. Dehydrator is 7.36 times more efficient than PostgreSQL, 7.16 times than Neo4j, and 16.17 times than Leonard (the work most closely related to Dehydrator, published at Usenix Security'23).