Explainable Semantic Textual Similarity via Dissimilar Span Detection

📅 2026-03-22

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This study addresses the limited interpretability of existing semantic textual similarity (STS) methods, which typically output only a single similarity score without indicating which specific text spans contribute to the judgment. To enhance interpretability, this work formally introduces and defines the task of Discrepancy Span Detection (DSD), aimed at identifying local segments within sentence pairs that exhibit semantic inconsistency. The authors construct a high-quality Span Similarity Dataset (SSD), annotated through a combination of large language models and human verification. Experimental results demonstrate that both LLM-based and supervised approaches achieve strong performance on the DSD task, and that incorporating DSD effectively improves downstream applications such as paraphrase detection, thereby validating its potential as a novel paradigm for interpretable STS.

Technology Category

Application Category

📝 Abstract

Semantic Textual Similarity (STS) is a crucial component of many Natural Language Processing (NLP) applications. However, existing approaches typically reduce semantic nuances to a single score, limiting interpretability. To address this, we introduce the task of Dissimilar Span Detection (DSD), which aims to identify semantically differing spans between pairs of texts. This can help users understand which particular words or tokens negatively affect the similarity score, or be used to improve performance in STS-dependent downstream tasks. Furthermore, we release a new dataset suitable for the task, the Span Similarity Dataset (SSD), developed through a semi-automated pipeline combining large language models (LLMs) with human verification. We propose and evaluate different baseline methods for DSD, both unsupervised, based on LIME, SHAP, LLMs, and our own method, as well as an additional supervised approach. While LLMs and supervised models achieve the highest performance, overall results remain low, highlighting the complexity of the task. Finally, we set up an additional experiment that shows how DSD can lead to increased performance in the specific task of paraphrase detection.

Problem

Research questions and friction points this paper is trying to address.

Semantic Textual Similarity

Explainability

Dissimilar Span Detection

Interpretability

Natural Language Processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dissimilar Span Detection

Explainable Semantic Textual Similarity

Span Similarity Dataset