Context-Aware Search and Retrieval Under Token Erasure

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

171K/year
🤖 AI Summary
This work addresses the reliability of retrieval-augmented generation (RAG) systems under query token erasure, which can lead to partial loss of query representations. To mitigate this issue, the authors propose a semantic importance-aware adaptive redundancy allocation mechanism that constructs query representations based on term frequency and retrieves documents using TF-IDF–weighted similarity. For the first time, the retrieval error probability is characterized from an information-theoretic perspective, yielding a multivariate Gaussian approximation of the similarity margin along with a computable upper bound. Both theoretical analysis and empirical experiments consistently demonstrate that allocating higher redundancy to semantically more important query features significantly enhances retrieval reliability—a principle that remains effective in embedding-based retrieval on real-world datasets.

Technology Category

Application Category

📝 Abstract
This paper introduces and analyzes a search and retrieval model for RAG-like systems under {token} erasures. We provide an information-theoretic analysis of remote document retrieval when query representations are only partially preserved. The query is represented using term-frequency-based features, and semantically adaptive redundancy is assigned according to feature importance. Retrieval is performed using TF-IDF-weighted similarity. We characterize the retrieval error probability by showing that the vector of similarity margins converges to a multivariate Gaussian distribution, yielding an explicit approximation and computable upper bounds. Numerical results support the analysis, while a separate data-driven evaluation using embedding-based retrieval on real-world data shows that the same importance-aware redundancy principles extend to modern retrieval pipelines. Overall, the results show that assigning higher redundancy to semantically important query features improves retrieval reliability.
Problem

Research questions and friction points this paper is trying to address.

token erasure
context-aware retrieval
query representation
retrieval reliability
RAG systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

token erasure
semantic redundancy
TF-IDF retrieval
information-theoretic analysis
importance-aware encoding
🔎 Similar Papers
No similar papers found.
S
Sara Ghasvarianjahromi
Helen and John C. Hartmann Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, New Jersey, 07102, USA
J
Joshua Barr
Helen and John C. Hartmann Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, New Jersey, 07102, USA
Yauhen Yakimenka
Yauhen Yakimenka
Postdoctoral Research Associate, New Jersey Institute of Technology
coding theoryinformation theoryprivate information retrievalcompressed sensing
J
Jörg Kliewer
Helen and John C. Hartmann Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, New Jersey, 07102, USA