ArtiFact: A Large-Scale Multi-Modal Cultural Heritage Dataset

📅 2026-06-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the scarcity of large-scale, real-world multimodal datasets integrating tables, text, and images by introducing ArtiFact—the first multimodal benchmark dataset tailored to the cultural heritage domain. ArtiFact comprises 651,045 structured records, textual descriptions, and images sourced from three major museums, and supports two core tasks: cross-modal error detection and semantic querying. The authors define seven fine-grained error categories and establish a comprehensive framework encompassing multi-source data collection, cross-modal alignment, synthetic error injection, and semantic query evaluation. Experimental results demonstrate that current methods struggle to reliably detect subtle inconsistencies—such as material or period mismatches—and perform inadequately on complex semantic queries, thereby highlighting ArtiFact’s value and challenge as a new benchmark for multimodal data management.

📝 Abstract

Multi-modal data management has emerged as a central research topic in the database community, spanning data integration, semantic query processing, and data quality assessment. Despite this growing interest, the community lacks large-scale, real-world datasets combining tables, text, and images. We present ArtiFact, a multi-modal cultural heritage dataset of 651045 museum records collected from the Metropolitan Museum of Art, the Art Institute of Chicago, and the Rijksmuseum. We demonstrate the utility of ArtiFact through two downstream tasks. For cross-modal error detection, we introduce a curated taxonomy of seven error categories injected into 130209 records and show that reliably detecting subtle domain-specific errors such as material anachronisms and temporal shifts remain an open challenge. For semantic query processing, we show that current systems struggle with queries involving cultural proximity, ambiguous object types, and historically contingent terminology. Our results position ArtiFact as a challenging benchmark for multi-modal data management research.

Problem

Research questions and friction points this paper is trying to address.

multi-modal data

error detection

semantic query processing

cultural heritage

data benchmark

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-modal dataset

cultural heritage

cross-modal error detection