🤖 AI Summary
This work addresses the challenge of efficiently filtering businesses by operating hours in large-scale location search systems, where conventional approaches often suffer from low query efficiency or index bloat. The authors propose Timehash, a hierarchical time indexing algorithm that constructs a customizable hash structure through multi-resolution time encoding and a data distribution–driven mechanism for selecting hierarchy levels. Timehash effectively supports complex range queries involving rest periods and irregular schedules. Experimental evaluation on a dataset of 12.6 million points of interest demonstrates that a five-level Timehash reduces the average number of index entries per document to 5.6—representing a 99.1% reduction compared to minute-level indexing—while achieving 100% query accuracy with no false positives or negatives, thereby offering both high expressiveness and strong scalability.
📝 Abstract
Temporal range filtering is a critical operation in large-scale search systems, particularly for location-based services that need to filter businesses by operating hours. Traditional approaches either suffer from poor query performance (scope filtering) or index size explosion (minute-level indexing).
We present Timehash, a novel hierarchical time indexing algorithm that achieves over 99% reduction in index size compared to minute-level indexing while maintaining 100% precision. Timehash employs a flexible multi-resolution strategy with customizable hierarchical levels. Through empirical analysis on distributions from 12.6 million business records of a production location search service, we demonstrate a data-driven methodology for selecting optimal hierarchies tailored to specific data distributions.
We evaluated Timehash on up to 12.6 million synthetic POIs generated from production distributions. Experimental results show that a five-level hierarchy reduces index terms to 5.6 per document (99.1% reduction versus minute-level indexing), with zero false positives and zero false negatives. Scalability benchmarks confirm constant per-document cost from 100K to 12.6M POIs, while supporting complex scenarios such as break times and irregular schedules. Our approach is generalizable to various temporal filtering problems in search systems, e-commerce, and reservation platforms.