SemBench: A Benchmark for Semantic Query Processing Engines

📅 2025-11-03

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Existing semantic query engines lack a systematic, multimodal evaluation benchmark. Method: This paper introduces the first benchmark framework tailored for LLM-driven semantic query processing. Orthogonally designed along three dimensions—application scenarios, data modalities (text, image, audio), and semantic operators (filter, join, map, sort, classify)—the framework transcends traditional SQL constraints and supports evaluation of generative and reasoning-based queries under natural-language instructions. Contribution/Results: The benchmark encompasses diverse tasks and real-world multimodal datasets, and is empirically validated across four state-of-the-art systems. Results reveal substantial disparities in accuracy, efficiency, and cross-scenario generalization capability, thereby establishing a reproducible, extensible, and standardized evaluation tool to advance semantic query engine research and development.

Technology Category

Application Category

📝 Abstract

We present a benchmark targeting a novel class of systems: semantic query processing engines. Those systems rely inherently on generative and reasoning capabilities of state-of-the-art large language models (LLMs). They extend SQL with semantic operators, configured by natural language instructions, that are evaluated via LLMs and enable users to perform various operations on multimodal data. Our benchmark introduces diversity across three key dimensions: scenarios, modalities, and operators. Included are scenarios ranging from movie review analysis to medical question-answering. Within these scenarios, we cover different data modalities, including images, audio, and text. Finally, the queries involve a diverse set of operators, including semantic filters, joins, mappings, ranking, and classification operators. We evaluated our benchmark on three academic systems (LOTUS, Palimpzest, and ThalamusDB) and one industrial system, Google BigQuery. Although these results reflect a snapshot of systems under continuous development, our study offers crucial insights into their current strengths and weaknesses, illuminating promising directions for future research.

Problem

Research questions and friction points this paper is trying to address.

Benchmarking semantic query engines using LLMs

Evaluating multimodal data processing across scenarios

Assessing diverse semantic operators in query systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends SQL with semantic operators via LLMs

Supports multimodal data including images and audio

Evaluates diverse semantic filters joins and ranking

🔎 Similar Papers

Assessing SPARQL Capabilities of Large Language Models