🤖 AI Summary
Transformer language models face quadratic computational and memory overhead with increasing context length. This work proposes Efficient Test-time Tuning (ETT), a lightweight method that extends context capacity by fine-tuning overlapping sliding-window subsequences—derived via input chunking—at test time, updating only the second-layer parameters of feed-forward networks (FFNs). ETT achieves constant memory footprint and linear computational complexity in sequence length, scaling context length up to 32× (e.g., from 1K to 32K tokens). On LongBench, it improves accuracy by up to 30% for GPT-Large and Phi-2, substantially outperforming full-parameter fine-tuning. The core contribution is the identification of the FFN’s second layer as the critical bottleneck for contextual information storage, and the first demonstration of a lightweight, scalable, and high-yield test-time context enhancement paradigm.
📝 Abstract
Transformer-based Language Models' computation and memory overhead increase quadratically as a function of sequence length. The quadratic cost poses challenges when employing LLMs for processing long sequences. In this work, we introduce ourmodelacronym~(Extend at Test-Time), method for extending the context length of short context Transformer-based LLMs, with constant memory requirement and linear computation overhead. ETT enable the extension of the context length at test-time by efficient fine-tuning the model's parameters on the input context, chunked into overlapping small subsequences. We evaluate ETT on LongBench by extending the context length of GPT-Large and Phi-2 up to 32 times, increasing from 1k to 32k tokens. This results in up to a 30 percent improvement in the model's accuracy. We also study how context can be stored in LLM's weights effectively and efficiently. Through a detailed ablation study, we examine which Transformer modules are most beneficial to fine-tune at test-time. Interestingly, we find that fine-tuning the second layer of the FFNs is more effective than full fine-tuning, leading to a further improvement in the models' accuracy.