Spectral Attention Steering for Prompt Highlighting

๐Ÿ“… 2026-03-01
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work proposes a training-free attention steering method that circumvents the need to store full attention matrices, thereby maintaining compatibility with memory-efficient implementations such as FlashAttention. By directly editing key embeddings via spectral decomposition prior to attention computation, the approach enables precise focus on user-specified textual content. It introduces an adaptive routing mechanism that dynamically fuses multiple expert subspaces, achieving low-overhead, high-fidelity prompt highlighting while preserving compatibility with efficient attention mechanismsโ€”an advancement not previously realized. Experimental results demonstrate that the method significantly outperforms strong baselines on standard steering benchmarks, while simultaneously reducing both latency and memory consumption.

Technology Category

Application Category

๐Ÿ“ Abstract
Attention steering is an important technique for controlling model focus, enabling capabilities such as prompt highlighting, where the model prioritises user-specified text. However, existing attention steering methods require explicit storage of the full attention matrix, making them incompatible with memory-efficient implementations like FlashAttention. We introduce Spectral Editing Key Amplification (SEKA), a training-free steering method that tackles this by directly editing key embeddings before attention computation. SEKA uses spectral decomposition to steer key embeddings towards latent directions that amplify attention scores for certain tokens. We extend this to Adaptive SEKA (AdaSEKA), a query-adaptive variant that uses a training-free routing mechanism to dynamically combine multiple expert subspaces based on the prompt's semantic intent. Our experiments show both methods significantly outperform strong baselines on standard steering benchmarks while adding much lower latency and memory overhead, in compatibility with optimised attention.
Problem

Research questions and friction points this paper is trying to address.

attention steering
prompt highlighting
memory efficiency
FlashAttention
key embeddings
Innovation

Methods, ideas, or system contributions that make the work stand out.

attention steering
spectral decomposition
training-free
memory-efficient attention
prompt highlighting
๐Ÿ”Ž Similar Papers
No similar papers found.