Word Salad Chopper: Reasoning Models Waste A Ton Of Decoding Budget On Useless Repetitions, Self-Knowingly

📅 2025-11-01

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Large reasoning models (LRMs) frequently generate meaningless repetitive text—termed “word salad”—during decoding, substantially wasting computational budget. To address this, we propose a lightweight, plug-and-play, real-time detection and pruning mechanism. Our method leverages hidden states at `< >` tokens during inference as self-aware signals; a single-layer linear classifier operates online to identify redundant generation patterns and triggers prompt truncation and regeneration. The approach is non-intrusive, requires no fine-tuning or additional training, and preserves output coherence while significantly compressing output length (average reduction >30%) with negligible quality degradation. Experiments demonstrate substantial decoding budget savings across diverse LRMs and tasks. The implementation is open-sourced and exhibits strong generalizability across architectures and reasoning benchmarks.

Technology Category

Application Category

📝 Abstract

Large Reasoning Models (LRMs) are often bottlenecked by the high cost of output tokens. We show that a significant portion of these tokens are useless self-repetitions - what we call "word salad" - that exhaust the decoding budget without adding value. Interestingly, we observe that LRMs are self-aware when trapped in these loops: the hidden states of < > tokens trailing each reasoning chunk exhibit patterns that allow us to detect word salad behavior on-the-fly via a single-layer linear classifier. Once detected, a simple chop appended by a straightforward regeneration prompt yields substantial length savings with minimal quality loss. Our work offers WordSaladChopper (WSC) - a lightweight, turnkey component for LRM that is minimally invasive to its reasoning trajectory by only removing semantically redundant tokens. Given its low overhead, strong savings, and the lack of semantic value of word salad tokens, we believe it is not too far-fetched to argue that WSC - or a similar component - is a must-have for all LRM applications with user experience in mind. Our code is publicly available at https://github.com/wenyaxie023/WordSaladChopper.

Problem

Research questions and friction points this paper is trying to address.

LRMs waste decoding budget on useless repetitive tokens

Detect word salad patterns via hidden state analysis

Remove redundant tokens to save length with minimal quality loss

Innovation

Methods, ideas, or system contributions that make the work stand out.

Detects word salad via hidden state patterns

Uses linear classifier for on-the-fly detection

Chops redundant tokens and regenerates content

🔎 Similar Papers

No similar papers found.