How to Train Your Filter: Should You Learn, Stack or Adapt?

📅 2026-02-13

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This study addresses the lack of unified evaluation and unclear applicability boundaries among learning-based, stacked, and adaptive Bloom filters. For the first time, it presents an end-to-end empirical comparison of these three advanced filter families under real-world datasets and query workloads, systematically analyzing their trade-offs in false positive rate (FPR), latency, and robustness. Experimental results demonstrate that learning-based filters achieve the lowest FPR but suffer from high and unstable latency; stacked filters can reduce FPR by up to three orders of magnitude under known workloads; and adaptive filters offer both robustness and theoretical guarantees in dynamic or adversarial environments. This work fills a critical gap in the literature by providing the first systematic comparative study and clear guidelines for selecting the appropriate filter type based on deployment scenarios.

Technology Category

Application Category

📝 Abstract

Filters are ubiquitous in computer science, enabling space-efficient approximate membership testing. Since Bloom filters were introduced in 1970, decades of work improved their space efficiency and performance. Recently, three new paradigms have emerged offering orders-of-magnitude improvements in false positive rates (FPRs) by using information beyond the input set: (1) learned filters train a model to distinguish (non)members, (2) stacked filters use negative workload samples to build cascading layers, and (3) adaptive filters update internal representation in response to false positive feedback. Yet each paradigm targets specific use cases, introduces complex configuration tuning, and has been evaluated in isolation. This results in unclear trade-offs and a gap in understanding how these approaches compare and when each is most appropriate. This paper presents the first comprehensive evaluation of learned, stacked, and adaptive filters across real-world datasets and query workloads. Our results reveal critical trade-offs: (1) Learned filters achieve up to 10^2 times lower FPRs but exhibit high variance and lack robustness under skewed or dynamic workloads. Critically, model inference overhead leads to up to 10^4 times slower query latencies than stacked or adaptive filters. (2) Stacked filters reliably achieve up to 10^3 times lower FPRs on skewed workloads but require workload knowledge. (3) Adaptive filters are robust across settings, achieving up to 10^3 times lower FPRs under adversarial queries without workload assumptions. Based on our analysis, learned filters suit stable workloads where input features enable effective model training and space constraints are paramount, stacked filters excel when reliable query distributions are known, and adaptive filters are most generalizable, providing robust theoretically bound guarantees even in dynamic or adversarial environments.

Problem

Research questions and friction points this paper is trying to address.

learned filters

stacked filters

adaptive filters

false positive rate

approximate membership testing

Innovation

Methods, ideas, or system contributions that make the work stand out.

learned filters

stacked filters

adaptive filters