Learning to Query History: Nonstationary Classification via Learned Retrieval

📅 2026-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance degradation of classification models in real-world deployment caused by non-stationary data distributions. It reframes non-stationary classification as a time-series forecasting problem and introduces an end-to-end trainable discrete retrieval mechanism. This mechanism employs input-dependent dynamic queries to selectively retrieve relevant information from labeled samples collected after the initial training period, and integrates a score-based gradient estimator for efficient optimization. By doing so, the method overcomes the conventional limitation that prevents models from leveraging newly annotated post-training data. Empirical results demonstrate substantial improvements in robustness to distribution shifts on both synthetic benchmarks and the Amazon Reviews '23 dataset (Electronics category), with memory consumption scaling predictably with the length of the historical sequence.
📝 Abstract
Nonstationarity is ubiquitous in practical classification settings, leading deployed models to perform poorly even when they generalize well to holdout sets available at training time. We address this by reframing nonstationary classification as time series prediction: rather than predicting from the current input alone, we condition the classifier on a sequence of historical labeled examples that extends beyond the training cutoff. To scale to large sequences, we introduce a learned discrete retrieval mechanism that samples relevant historical examples via input-dependent queries, trained end-to-end with the classifier using a score-based gradient estimator. This enables the full corpus of historical data to remain on an arbitrary filesystem during training and deployment. Experiments on synthetic benchmarks and Amazon Reviews '23 (electronics category) show improved robustness to distribution shift compared to standard classifiers, with VRAM scaling predictably as the length of the historical data sequence increases.
Problem

Research questions and friction points this paper is trying to address.

nonstationarity
classification
distribution shift
time series prediction
historical data
Innovation

Methods, ideas, or system contributions that make the work stand out.

nonstationary classification
learned retrieval
time series prediction
discrete retrieval mechanism
distribution shift robustness
🔎 Similar Papers
No similar papers found.
J
Jimmy Gammell
Purdue University, Elmore Family School of Electrical and Computer Engineering
B
Bishal Thapaliya
Amazon
Y
Yoon Jung
Amazon
R
Riyasat Ohib
Georgia Institute of Technology
B
Bilel Fehri
Amazon
Deepayan Chakrabarti
Deepayan Chakrabarti
Associate Professor, McCombs School of Business, UT Austin
Machine LearningGraph MiningRobust OptimizationBig Data