Spike Hijacking in Late-Interaction Retrieval

📅 2026-04-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses critical limitations of the widely adopted hard max-similarity (MaxSim) pooling in late interaction retrieval, notably gradient concentration and sensitivity to document length. We first uncover the gradient spiking phenomenon induced by MaxSim, elucidate its structural bias in gradient routing, and highlight an inherent trade-off between sparsity and robustness. To mitigate these issues, we propose smooth pooling alternatives and conduct a systematic evaluation incorporating in-batch contrastive training, top-k pooling, softmax aggregation, and multi-vector retrieval benchmarks. Experimental results demonstrate that MaxSim suffers significant performance degradation under varying document lengths, whereas smooth pooling methods exhibit markedly improved robustness, confirming that gradient concentration constitutes a fundamental flaw in late interaction models.
📝 Abstract
Late-interaction retrieval models rely on hard maximum similarity (MaxSim) to aggregate token-level similarities. Although effective, this winner-take-all pooling rule may structurally bias training dynamics. We provide a mechanistic study of gradient routing and robustness in MaxSim-based retrieval. In a controlled synthetic environment with in-batch contrastive training, we demonstrate that MaxSim induces significantly higher patch-level gradient concentration than smoother alternatives such as Top-k pooling and softmax aggregation. While sparse routing can improve early discrimination, it also increases sensitivity to document length: as the number of document patches grows, MaxSim degrades more sharply than mild smoothing variants. We corroborate these findings on a real-world multi-vector retrieval benchmark, where controlled document-length sweeps reveal similar brittleness under hard max pooling. Together, our results isolate pooling-induced gradient concentration as a structural property of late-interaction retrieval and highlight a sparsity-robustness tradeoff. These findings motivate principled alternatives to hard max pooling in multi-vector retrieval systems.
Problem

Research questions and friction points this paper is trying to address.

late-interaction retrieval
MaxSim
gradient concentration
document length sensitivity
pooling sparsity
Innovation

Methods, ideas, or system contributions that make the work stand out.

late-interaction retrieval
MaxSim pooling
gradient concentration
sparsity-robustness tradeoff
multi-vector retrieval
K
Karthik Suresh
Adobe, 345 Park Ave, San Jose, CA 95110, USA
Tushar Vatsa
Tushar Vatsa
Adobe
Machine Learning
T
Tracy King
Adobe, 345 Park Ave, San Jose, CA 95110, USA
Asim Kadav
Asim Kadav
Adobe
deep learningmachine learningsystemsmachine reasoningLLM training and inference
M
Michael Friedrich
Adobe, 345 Park Ave, San Jose, CA 95110, USA