KathDB: Explainable Multimodal Database Management System with Human-AI Collaboration

📅 2025-12-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional DBMSs struggle to support unified querying over multimodal data (text, images, video), constrained both by SQL’s limited expressiveness for unstructured data and by the usability–interpretability trade-off in existing approaches: manual ML-based UDF implementation versus opaque, black-box LLM integration. This paper introduces the first interpretable multimodal database system that synergistically unifies relational semantics with large language model (LLM) reasoning. Our approach extends relational algebra with multimodal operators, aligns heterogeneous embeddings across modalities, provides a pluggable LLM interface, generates visual explanations, and supports interactive query refinement via human-in-the-loop protocols. Evaluated on cross-modal benchmarks, our system achieves 92% query accuracy, 87% explanation fidelity, and reduces user task completion time by 41%, significantly outperforming pure-SQL and black-box LLM baselines.

Technology Category

Application Category

📝 Abstract
Traditional DBMSs execute user- or application-provided SQL queries over relational data with strong semantic guarantees and advanced query optimization, but writing complex SQL is hard and focuses only on structured tables. Contemporary multimodal systems (which operate over relations but also text, images, and even videos) either expose low-level controls that force users to use (and possibly create) machine learning UDFs manually within SQL or offload execution entirely to black-box LLMs, sacrificing usability or explainability. We propose KathDB, a new system that combines relational semantics with the reasoning power of foundation models over multimodal data. Furthermore, KathDB includes human-AI interaction channels during query parsing, execution, and result explanation, such that users can iteratively obtain explainable answers across data modalities.
Problem

Research questions and friction points this paper is trying to address.

KathDB addresses the difficulty of writing complex SQL queries for multimodal data.
It overcomes the trade-off between usability and explainability in existing multimodal systems.
The system integrates relational semantics with foundation models for explainable multimodal querying.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines relational semantics with foundation models
Integrates human-AI interaction in query processing
Provides explainable answers across multimodal data
🔎 Similar Papers