Dataset of News Articles with Provenance Metadata for Media Relevance Assessment

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for detecting manipulated images in misinformation—particularly those involving context detachment and misattribution—rely solely on semantic alignment, leading to high false-negative rates. Method: We introduce the first news image-text dataset annotated with structured provenance metadata (location, timestamp, source), and formulate media provenance as a spatiotemporal correlation discrimination task. We define two novel subtasks: Location-Origin Relevance (LOR) and Date-Time Origin Relevance (DTOR). Using six large language models, we conduct zero-shot multimodal evaluation by jointly modeling news text, image captions, and provenance metadata. Contribution/Results: Our LOR accuracy reaches 68.3%, while DTOR performance remains consistently below 42%, highlighting the greater difficulty of temporal provenance. This work breaks from the semantic-alignment paradigm, establishing the first benchmark and architectural foundation for dedicated provenance modeling.

Technology Category

Application Category

📝 Abstract
Out-of-context and misattributed imagery is the leading form of media manipulation in today's misinformation and disinformation landscape. The existing methods attempting to detect this practice often only consider whether the semantics of the imagery corresponds to the text narrative, missing manipulation so long as the depicted objects or scenes somewhat correspond to the narrative at hand. To tackle this, we introduce News Media Provenance Dataset, a dataset of news articles with provenance-tagged images. We formulate two tasks on this dataset, location of origin relevance (LOR) and date and time of origin relevance (DTOR), and present baseline results on six large language models (LLMs). We identify that, while the zero-shot performance on LOR is promising, the performance on DTOR hinders, leaving room for specialized architectures and future work.
Problem

Research questions and friction points this paper is trying to address.

Detecting out-of-context and misattributed imagery in news articles
Assessing media relevance via provenance metadata (LOR and DTOR tasks)
Evaluating LLM performance on origin and time relevance in misinformation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Provenance-tagged images for media relevance
LOR and DTOR tasks for assessment
Baseline LLM results highlight DTOR challenges
🔎 Similar Papers
No similar papers found.