🤖 AI Summary
Existing methods for similarity search over data sequences are fragmented and constrained to specific execution environments, lacking a unified and efficient cross-platform solution. This work proposes and open-sources DaiSy, the first unified framework for exact similarity search that seamlessly supports disk-based, in-memory, GPU-accelerated, and distributed settings, while accommodating both sequence and vector data. Integrating multiple state-of-the-art algorithms, DaiSy provides both C++ and Python interfaces, significantly enhancing scalability and deployment flexibility in large-scale scenarios.
📝 Abstract
Exact similarity search over large collections of data series is a fundamental operation in modern applications, yet existing solutions are often fragmented, specialized, or tailored to specific execution environments. In this paper, we present DaiSy, a unified library for exact data series similarity search that integrates multiple state-of-the-art algorithms within a single, coherent framework. DaiSy is the first library to support exact similarity search across diverse execution environments, including implementations for disk-based, in-memory, GPU-accelerated, and distributed scalable similarity search. Although designed for data series, DaiSy is also directly applicable to exact similarity search over vector data, enabling its use in a broader range of applications. The library supports interfaces in both C++ and Python, enabling users to easily integrate its functionality into a variety of tasks. DaiSy is open-sourced and available at: https://github.com/MChatzakis/DaiSy.