ETT: Expanding the Long Context Understanding Capability of LLMs at Test-Time

📅 2025-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Transformer language models face quadratic computational and memory overhead with increasing context length. This work proposes Efficient Test-time Tuning (ETT), a lightweight method that extends context capacity by fine-tuning overlapping sliding-window subsequences—derived via input chunking—at test time, updating only the second-layer parameters of feed-forward networks (FFNs). ETT achieves constant memory footprint and linear computational complexity in sequence length, scaling context length up to 32× (e.g., from 1K to 32K tokens). On LongBench, it improves accuracy by up to 30% for GPT-Large and Phi-2, substantially outperforming full-parameter fine-tuning. The core contribution is the identification of the FFN’s second layer as the critical bottleneck for contextual information storage, and the first demonstration of a lightweight, scalable, and high-yield test-time context enhancement paradigm.

Technology Category

Application Category

📝 Abstract
Transformer-based Language Models' computation and memory overhead increase quadratically as a function of sequence length. The quadratic cost poses challenges when employing LLMs for processing long sequences. In this work, we introduce ourmodelacronym~(Extend at Test-Time), method for extending the context length of short context Transformer-based LLMs, with constant memory requirement and linear computation overhead. ETT enable the extension of the context length at test-time by efficient fine-tuning the model's parameters on the input context, chunked into overlapping small subsequences. We evaluate ETT on LongBench by extending the context length of GPT-Large and Phi-2 up to 32 times, increasing from 1k to 32k tokens. This results in up to a 30 percent improvement in the model's accuracy. We also study how context can be stored in LLM's weights effectively and efficiently. Through a detailed ablation study, we examine which Transformer modules are most beneficial to fine-tune at test-time. Interestingly, we find that fine-tuning the second layer of the FFNs is more effective than full fine-tuning, leading to a further improvement in the models' accuracy.
Problem

Research questions and friction points this paper is trying to address.

Extend context length of LLMs with linear overhead
Improve long sequence processing accuracy
Optimize fine-tuning for efficient context storage
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends context length with linear computation overhead
Efficient fine-tuning on overlapping subsequences
Optimizes FFN layers for better accuracy
🔎 Similar Papers
No similar papers found.
Kiarash Zahirnia
Kiarash Zahirnia
Machine Learning Researcher, SFU
Z
Zahra Golpayegani
Ascend Team, Toronto Research Center, Huawei Technologies
W
Walid Ahmad
Ascend Team, Toronto Research Center, Huawei Technologies
Y
Yang Liu
Ascend Team, Toronto Research Center, Huawei Technologies