Spike Hijacking in Late-Interaction Retrieval

📅 2026-04-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses critical limitations of the widely adopted hard max-similarity (MaxSim) pooling in late interaction retrieval, notably gradient concentration and sensitivity to document length. We first uncover the gradient spiking phenomenon induced by MaxSim, elucidate its structural bias in gradient routing, and highlight an inherent trade-off between sparsity and robustness. To mitigate these issues, we propose smooth pooling alternatives and conduct a systematic evaluation incorporating in-batch contrastive training, top-k pooling, softmax aggregation, and multi-vector retrieval benchmarks. Experimental results demonstrate that MaxSim suffers significant performance degradation under varying document lengths, whereas smooth pooling methods exhibit markedly improved robustness, confirming that gradient concentration constitutes a fundamental flaw in late interaction models.

📝 Abstract

Late-interaction retrieval models rely on hard maximum similarity (MaxSim) to aggregate token-level similarities. Although effective, this winner-take-all pooling rule may structurally bias training dynamics. We provide a mechanistic study of gradient routing and robustness in MaxSim-based retrieval. In a controlled synthetic environment with in-batch contrastive training, we demonstrate that MaxSim induces significantly higher patch-level gradient concentration than smoother alternatives such as Top-k pooling and softmax aggregation. While sparse routing can improve early discrimination, it also increases sensitivity to document length: as the number of document patches grows, MaxSim degrades more sharply than mild smoothing variants. We corroborate these findings on a real-world multi-vector retrieval benchmark, where controlled document-length sweeps reveal similar brittleness under hard max pooling. Together, our results isolate pooling-induced gradient concentration as a structural property of late-interaction retrieval and highlight a sparsity-robustness tradeoff. These findings motivate principled alternatives to hard max pooling in multi-vector retrieval systems.

Problem

Research questions and friction points this paper is trying to address.

late-interaction retrieval

MaxSim

gradient concentration

document length sensitivity

pooling sparsity

Innovation

Methods, ideas, or system contributions that make the work stand out.

late-interaction retrieval

MaxSim pooling

gradient concentration

sparsity-robustness tradeoff

multi-vector retrieval

🔎 Similar Papers

Brain-aligning of semantic vectors improves neural decoding of visual stimuli

2024-03-22Citations: 0

Authors to Follow