SLOT: Sample-specific Language Model Optimization at Test-time

📅 2025-05-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Contemporary large language models exhibit limited generalization capability on complex or long-tail instructions. To address this, we propose SLOT (Sample-specific Language Model Optimization at Test-time), a lightweight, sample-specific test-time adaptation method: for each input, only the final hidden-layer features are cached; a learnable vector—accounting for less than 0.1% of the model’s parameters—is injected into these features and updated in a single step via cross-entropy loss, without backpropagating gradients into the backbone network. This paradigm establishes, for the first time, parameter-minimal, sample-level, zero-backpropagation test-time adaptation. Experiments demonstrate substantial gains: Qwen2.5-7B achieves an 8.6-percentage-point accuracy improvement on GSM8K (57.54% → 66.19%); DeepSeek-R1-Distill-Llama-70B attains state-of-the-art performance among 70B-scale models on GPQA (68.69%).

Technology Category

Application Category

📝 Abstract
We propose SLOT (Sample-specific Language Model Optimization at Test-time), a novel and parameter-efficient test-time inference approach that enhances a language model's ability to more accurately respond to individual prompts. Existing Large Language Models (LLMs) often struggle with complex instructions, leading to poor performances on those not well represented among general samples. To address this, SLOT conducts few optimization steps at test-time to update a light-weight sample-specific parameter vector. It is added to the final hidden layer before the output head, and enables efficient adaptation by caching the last layer features during per-sample optimization. By minimizing the cross-entropy loss on the input prompt only, SLOT helps the model better aligned with and follow each given instruction. In experiments, we demonstrate that our method outperforms the compared models across multiple benchmarks and LLMs. For example, Qwen2.5-7B with SLOT achieves an accuracy gain of 8.6% on GSM8K from 57.54% to 66.19%, while DeepSeek-R1-Distill-Llama-70B with SLOT achieves a SOTA accuracy of 68.69% on GPQA among 70B-level models. Our code is available at https://github.com/maple-research-lab/SLOT.
Problem

Research questions and friction points this paper is trying to address.

Enhances LM accuracy for individual prompts
Optimizes lightweight parameters per sample at test-time
Improves complex instruction handling in LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-time few-step optimization for LM adaptation
Lightweight sample-specific parameter vector update
Cross-entropy loss minimization on input prompt
🔎 Similar Papers
No similar papers found.
Y
Yang Hu
Westlake University
Xingyu Zhang
Xingyu Zhang
Horizon Robotics Inc
NLP&VLM&AD
Xueji Fang
Xueji Fang
Zhejiang University
Diffusion ModelsMultimodal Language ModelsComputer Vision
Z
Zhiyang Chen
Westlake University
X
Xiao Wang
University of Washington
Huatian Zhang
Huatian Zhang
University of Science and Technology of China
G
Guojun Qi
Westlake University