🤖 AI Summary
This work addresses the challenge of optimizing semantic predicates in AI-driven SQL query processing, which suffer from high inference overhead and opaque black-box behavior that hinder efficient querying over unstructured data. To overcome these limitations, the authors propose Larch, a novel framework that, for the first time, models semantic filtering expressions as graph structures. Larch integrates embedding-enhanced gated graph neural networks with a Markov decision process to dynamically predict selectivity and optimize execution order through either reinforcement or supervised learning. Evaluated on both real-world and synthetic workloads, Larch substantially reduces computational and token costs, achieving 3–19× lower token consumption compared to state-of-the-art systems such as Palimpzest and Quest, while consistently outperforming existing approaches.
📝 Abstract
With the advent of Large Language Models (LLMs), many database systems introduced semantic operators that enabled analytical queries over unstructured data (e.g. text, images, videos). Semantic operators typically incur high inference costs and latencies making semantic (AI) SQL queries challenging to apply on large scale datasets. At the same time, their semantic nature leads database engines to treat them as black boxes, making AISQL queries difficult to optimize. In this paper, we introduce Larch, a framework for optimizing the execution of semantic filters in AI SQL queries. Larch was inspired by two key observations: i) the high latency of semantic operators leaves significant room for computationally-heavy runtime optimization techniques, ii) unstructured data are typically accompanied by semantic information in the form of embeddings allowing for efficient semantic comparisons between AI_FILTER prompts and data values. Based on these two key observations, we present two Larch variants: Larch-A2C and Larch-Sel. Larch-A2C encodes arbitrary semantic filters expression tree using an embedding-augmented Gated Graph Neural Network and formulates the filter evaluation order as a Markov decision process. In contrast, Larch-Sel leverages a supervised learning model to predict filter selectivities, subsequently applying dynamic programming to find a near-optimal evaluation order for each input row. Evaluated across diverse real-world datasets and comprehensive synthetic workloads, both Larch variants always outperform existing semantic filter optimization techniques in terms of token usage. Our results demonstrate that Larch is robust across diverse workloads, reducing total token cost overhead by 3x-19x compared to Palimpzest and Quest.