Semantics at an Angle: When Cosine Similarity Works Until It Doesn't

📅 2025-04-22

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Cosine similarity—the de facto standard for embedding comparison—ignores semantic information encoded in vector magnitudes, rendering it inadequate in magnitude-sensitive scenarios (e.g., intensity, confidence, or granularity representation). Method: From a geometric–semantic coupling perspective, we systematically expose its implicit “direction-only semantics” assumption—a previously unrecognized bias—and advance the core thesis “magnitude is meaning.” Through theoretical analysis, geometric modeling, adversarial counterexample construction, and controlled experiments (L2-normalized vs. raw embeddings), we characterize its dual nature: robustness under directional alignment yet fragility under magnitude semantics. Contribution/Results: Our work shifts similarity design from pure directional modeling toward joint direction–magnitude modeling; provides an interpretable, semantics-aware metric selection framework for semantic retrieval, alignment, and evaluation; and establishes *Norm-aware Similarity* as a novel paradigm in embedding-based similarity learning.

Technology Category

Application Category

📝 Abstract

Cosine similarity has become a standard metric for comparing embeddings in modern machine learning. Its scale-invariance and alignment with model training objectives have contributed to its widespread adoption. However, recent studies have revealed important limitations, particularly when embedding norms carry meaningful semantic information. This informal article offers a reflective and selective examination of the evolution, strengths, and limitations of cosine similarity. We highlight why it performs well in many settings, where it tends to break down, and how emerging alternatives are beginning to address its blind spots. We hope to offer a mix of conceptual clarity and practical perspective, especially for quantitative scientists who think about embeddings not just as vectors, but as geometric and philosophical objects.

Problem

Research questions and friction points this paper is trying to address.

Examines limitations of cosine similarity in embedding comparisons

Explores scenarios where embedding norms affect semantic accuracy

Discusses emerging alternatives to address cosine similarity blind spots

Innovation

Methods, ideas, or system contributions that make the work stand out.

Examining cosine similarity limitations in embeddings

Highlighting semantic information in embedding norms

Exploring emerging alternatives to cosine similarity

🔎 Similar Papers

No similar papers found.

Authors to Follow